ichor.core.database.sql package

Submodules

ichor.core.database.sql.add_to_database module

add_atom_names_to_database(session: Session, atom_names: List[str], echo=False)

Adds a list of atom names to the atom_names table of the database.

Parameters:
  • database_path – Path to database

  • atom_names – A list of atom names, e.g. [“C1”, “H2”, “H3”, ….]

add_point_to_database(session: Session, point: ichor.core.files.PointDirectory, echo=False, print_missing_data=True)

Adds information from an instance of a PointDirectory to the database.

Parameters:
  • database_path – Path to database

  • point – A PointDirectory instance, containing Gaussian/AIMAll outputs that can be written to the database.

Note

Even if atomic data (.int file) is missing for a particular atom in the system, the information for the point will still be added to the database. This is because the rest the point can still be used in the training set for the other atoms.

create_database_session(database_path: Path, echo=False)

Creates a sqlalchemy Engine object as well as a Session object which are used to interact with the SQLite database.

Parameters:
  • database_path – pathlib.Path object to database

  • echo – Whether for SQLAlchemy to echo SQL commands used, defaults to False

ichor.core.database.sql.database module

class AtomNames(**kwargs)

Bases: Base

children
id
name
class Dataset(**kwargs)

Bases: Base

atom_id
atom_names_parent
force_x
force_y
force_z
id
integration_error
iqa
point_id
points_parent
q00
q10
q11c
q11s
q20
q21c
q21s
q22c
q22s
q30
q31c
q31s
q32c
q32s
q33c
q33s
q40
q41c
q41s
q42c
q42s
q43c
q43s
q44c
q44s
q50
q51c
q51s
q52c
q52s
q53c
q53s
q54c
q54s
q55c
q55s
x
y
z
class Points(**kwargs)

Bases: Base

children
date_added
id
name
wfn_energy
create_database(database_path: str | Path, echo=False)

Creates empty database in which important information from a PointsDirectory instance can be stored.

Parameters:

database_path – A string or Path to a (non-existing) database on disk.

ichor.core.database.sql.query_database module

create_sqlite_db_connection(db_path: str | Path, echo=False) Connection
Creates a connection to a SQLite3 database and returns a connection object to be used

when executing SQL statements with pandas.

Parameters:
  • db_path – Path to SQLite3 database

  • echo – Whether to echo SQL queries, defaults to False

Returns:

A connection object to the SQL database

Return type:

sqlalchemy.engine.Connection

create_sqlite_db_engine(db_path: str | Path, echo=False) Engine

Creates an engine to a SQLite3 database and returns the Engine object.

Parameters:
  • db_path – Path to SQLite3 database

  • echo – Whether to echo SQL queries, defaults to False

Returns:

An egnine object for the SQL database

Return type:

sqlalchemy.engine.Engine

csv_file_with_specific_properties(point_ids, full_df, all_atom_names, properties: List[str])

Writes out csv file for each atom containing the given properties

delete_sqlite_points_by_id(engine, point_ids: List[int])
get_atoms_from_sqlite_point_id(full_df, point_id: int) Atoms

Returns an Atoms instance containing geometry for a point id.

Parameters:
  • full_df – see get_df_information function

  • point_id – The id of the point for which to get the geometry

get_full_dataframe_for_all_atoms(db_path: str | Path, echo=False, change_cols=True) DataFrame

Returns a dataframe containing all the data for all atoms

Parameters:
  • db_path – Path or str to database

  • echo – Whether to echo SQL statements, defaults to False

  • change_cols – Removes some unnecessary columns and also renames some columns to be more clear.

Returns:

A pandas dataframe containing information for all atoms

get_list_of_atom_names_from_sqlite_db(db_path: str | Path, echo=False) List[str]

Returns a list of all the point names in the database

Parameters:
  • db_path – Path or string to database

  • echo – Whether to echo SQL queries, defaults to False

Returns:

List of strings of the atom names

get_list_of_point_ids_from_sqlite_db(db_path: str | Path, echo=False) List[str]

Returns a list of all the point names in the database

Parameters:
  • db_path – Path or string to database

  • echo – Whether to echo SQL queries, defaults to False

Returns:

List of strings of the point names

get_sqlite_db_information(db_path: str | Path, echo=False) Tuple[List[str], List[str], DataFrame]

Gets relevant information from database needed to post process data and generate datasets for machine learning

Parameters:
  • db_path – Path to SQLite3 database containing Points, AtomNames, and Dataset tables.

  • echo – Whether to echo executed SQL statements, defaults to False

Returns:

Tuple of: List of point ids (integers) contained in the db, List of atom names (str) contained in db, a pd.DataFrame object containing all relevant data needed to construct the datasets.

Return type:

Tuple[List[str], List[str], pd.DataFrame]

raw_one_atom_data_to_df_sqlite(db_path: str | Path, atom_name: str, integration_error: float = 0.001, echo=False, drop_irrelevant_cols=True) DataFrame

Returns a pandas DataFrame object containing data for one atom.

Parameters:
  • db_path – Path or str to SQLite3 database

  • atom_name – string of atom name (e.g. C1)

  • integration_error – Integration error for AIMAll. Any point with a higher absolute integration error will not be selected, defaults to 0.001

  • echo – Whether to echo the executed SQL queries, defaults to False

  • drop_irrelevant_cols – Whether to drop irrelevant columns (id columns that do not contain data) from the DataFrame, defaults to True

Returns:

pd.DataFrame object containing the information for the DataFrame

trajectory_from_sqlite_database(point_ids, full_df, trajectory_name: str = 'trajectory_from_database.xyz')

Writes our trajectory from geometries in database.

write_raw_one_atom_data_to_csv_sqlite(db_path: str | Path, atom_name: str, integration_error: float = 0.001, echo=False, drop_irrelevant_cols=True)

Saves the raw data for one atom as stored in the SQLite3 database to a csv file.

Parameters:
  • db_path – Path or str to SQLite3 database

  • atom_name – string of atom name (e.g. C1)

  • integration_error – Integration error for AIMAll. Any point with a higher absolute integration error will not be selected, defaults to 0.001

  • echo – Whether to echo the executed SQL queries, defaults to False

  • drop_irrelevant_cols – Whether to drop irrelevant columns (id columns that do not contain data) from the DataFrame, defaults to True

Module contents

add_atom_names_to_database(session: Session, atom_names: List[str], echo=False)

Adds a list of atom names to the atom_names table of the database.

Parameters:
  • database_path – Path to database

  • atom_names – A list of atom names, e.g. [“C1”, “H2”, “H3”, ….]

add_point_to_database(session: Session, point: ichor.core.files.PointDirectory, echo=False, print_missing_data=True)

Adds information from an instance of a PointDirectory to the database.

Parameters:
  • database_path – Path to database

  • point – A PointDirectory instance, containing Gaussian/AIMAll outputs that can be written to the database.

Note

Even if atomic data (.int file) is missing for a particular atom in the system, the information for the point will still be added to the database. This is because the rest the point can still be used in the training set for the other atoms.

create_database(database_path: str | Path, echo=False)

Creates empty database in which important information from a PointsDirectory instance can be stored.

Parameters:

database_path – A string or Path to a (non-existing) database on disk.

create_database_session(database_path: Path, echo=False)

Creates a sqlalchemy Engine object as well as a Session object which are used to interact with the SQLite database.

Parameters:
  • database_path – pathlib.Path object to database

  • echo – Whether for SQLAlchemy to echo SQL commands used, defaults to False

get_sqlite_db_information(db_path: str | Path, echo=False) Tuple[List[str], List[str], DataFrame]

Gets relevant information from database needed to post process data and generate datasets for machine learning

Parameters:
  • db_path – Path to SQLite3 database containing Points, AtomNames, and Dataset tables.

  • echo – Whether to echo executed SQL statements, defaults to False

Returns:

Tuple of: List of point ids (integers) contained in the db, List of atom names (str) contained in db, a pd.DataFrame object containing all relevant data needed to construct the datasets.

Return type:

Tuple[List[str], List[str], pd.DataFrame]