ichor.core.database.sql package
Submodules
ichor.core.database.sql.add_to_database module
- add_atom_names_to_database(session: Session, atom_names: List[str], echo=False)
Adds a list of atom names to the atom_names table of the database.
- Parameters:
database_path – Path to database
atom_names – A list of atom names, e.g. [“C1”, “H2”, “H3”, ….]
- add_point_to_database(session: Session, point: ichor.core.files.PointDirectory, echo=False, print_missing_data=True)
Adds information from an instance of a PointDirectory to the database.
- Parameters:
database_path – Path to database
point – A PointDirectory instance, containing Gaussian/AIMAll outputs that can be written to the database.
Note
Even if atomic data (.int file) is missing for a particular atom in the system, the information for the point will still be added to the database. This is because the rest the point can still be used in the training set for the other atoms.
- create_database_session(database_path: Path, echo=False)
Creates a sqlalchemy Engine object as well as a Session object which are used to interact with the SQLite database.
- Parameters:
database_path – pathlib.Path object to database
echo – Whether for SQLAlchemy to echo SQL commands used, defaults to False
ichor.core.database.sql.database module
- class Dataset(**kwargs)
Bases:
Base- atom_id
- atom_names_parent
- force_x
- force_y
- force_z
- id
- integration_error
- iqa
- point_id
- points_parent
- q00
- q10
- q11c
- q11s
- q20
- q21c
- q21s
- q22c
- q22s
- q30
- q31c
- q31s
- q32c
- q32s
- q33c
- q33s
- q40
- q41c
- q41s
- q42c
- q42s
- q43c
- q43s
- q44c
- q44s
- q50
- q51c
- q51s
- q52c
- q52s
- q53c
- q53s
- q54c
- q54s
- q55c
- q55s
- x
- y
- z
- create_database(database_path: str | Path, echo=False)
Creates empty database in which important information from a PointsDirectory instance can be stored.
- Parameters:
database_path – A string or Path to a (non-existing) database on disk.
ichor.core.database.sql.query_database module
- create_sqlite_db_connection(db_path: str | Path, echo=False) Connection
- Creates a connection to a SQLite3 database and returns a connection object to be used
when executing SQL statements with pandas.
- Parameters:
db_path – Path to SQLite3 database
echo – Whether to echo SQL queries, defaults to False
- Returns:
A connection object to the SQL database
- Return type:
sqlalchemy.engine.Connection
- create_sqlite_db_engine(db_path: str | Path, echo=False) Engine
Creates an engine to a SQLite3 database and returns the Engine object.
- Parameters:
db_path – Path to SQLite3 database
echo – Whether to echo SQL queries, defaults to False
- Returns:
An egnine object for the SQL database
- Return type:
sqlalchemy.engine.Engine
- csv_file_with_specific_properties(point_ids, full_df, all_atom_names, properties: List[str])
Writes out csv file for each atom containing the given properties
- delete_sqlite_points_by_id(engine, point_ids: List[int])
- get_atoms_from_sqlite_point_id(full_df, point_id: int) Atoms
Returns an Atoms instance containing geometry for a point id.
- Parameters:
full_df – see get_df_information function
point_id – The id of the point for which to get the geometry
- get_full_dataframe_for_all_atoms(db_path: str | Path, echo=False, change_cols=True) DataFrame
Returns a dataframe containing all the data for all atoms
- Parameters:
db_path – Path or str to database
echo – Whether to echo SQL statements, defaults to False
change_cols – Removes some unnecessary columns and also renames some columns to be more clear.
- Returns:
A pandas dataframe containing information for all atoms
- get_list_of_atom_names_from_sqlite_db(db_path: str | Path, echo=False) List[str]
Returns a list of all the point names in the database
- Parameters:
db_path – Path or string to database
echo – Whether to echo SQL queries, defaults to False
- Returns:
List of strings of the atom names
- get_list_of_point_ids_from_sqlite_db(db_path: str | Path, echo=False) List[str]
Returns a list of all the point names in the database
- Parameters:
db_path – Path or string to database
echo – Whether to echo SQL queries, defaults to False
- Returns:
List of strings of the point names
- get_sqlite_db_information(db_path: str | Path, echo=False) Tuple[List[str], List[str], DataFrame]
Gets relevant information from database needed to post process data and generate datasets for machine learning
- Parameters:
db_path – Path to SQLite3 database containing Points, AtomNames, and Dataset tables.
echo – Whether to echo executed SQL statements, defaults to False
- Returns:
Tuple of: List of point ids (integers) contained in the db, List of atom names (str) contained in db, a pd.DataFrame object containing all relevant data needed to construct the datasets.
- Return type:
Tuple[List[str], List[str], pd.DataFrame]
- raw_one_atom_data_to_df_sqlite(db_path: str | Path, atom_name: str, integration_error: float = 0.001, echo=False, drop_irrelevant_cols=True) DataFrame
Returns a pandas DataFrame object containing data for one atom.
- Parameters:
db_path – Path or str to SQLite3 database
atom_name – string of atom name (e.g. C1)
integration_error – Integration error for AIMAll. Any point with a higher absolute integration error will not be selected, defaults to 0.001
echo – Whether to echo the executed SQL queries, defaults to False
drop_irrelevant_cols – Whether to drop irrelevant columns (id columns that do not contain data) from the DataFrame, defaults to True
- Returns:
pd.DataFrame object containing the information for the DataFrame
- trajectory_from_sqlite_database(point_ids, full_df, trajectory_name: str = 'trajectory_from_database.xyz')
Writes our trajectory from geometries in database.
- write_raw_one_atom_data_to_csv_sqlite(db_path: str | Path, atom_name: str, integration_error: float = 0.001, echo=False, drop_irrelevant_cols=True)
Saves the raw data for one atom as stored in the SQLite3 database to a csv file.
- Parameters:
db_path – Path or str to SQLite3 database
atom_name – string of atom name (e.g. C1)
integration_error – Integration error for AIMAll. Any point with a higher absolute integration error will not be selected, defaults to 0.001
echo – Whether to echo the executed SQL queries, defaults to False
drop_irrelevant_cols – Whether to drop irrelevant columns (id columns that do not contain data) from the DataFrame, defaults to True
Module contents
- add_atom_names_to_database(session: Session, atom_names: List[str], echo=False)
Adds a list of atom names to the atom_names table of the database.
- Parameters:
database_path – Path to database
atom_names – A list of atom names, e.g. [“C1”, “H2”, “H3”, ….]
- add_point_to_database(session: Session, point: ichor.core.files.PointDirectory, echo=False, print_missing_data=True)
Adds information from an instance of a PointDirectory to the database.
- Parameters:
database_path – Path to database
point – A PointDirectory instance, containing Gaussian/AIMAll outputs that can be written to the database.
Note
Even if atomic data (.int file) is missing for a particular atom in the system, the information for the point will still be added to the database. This is because the rest the point can still be used in the training set for the other atoms.
- create_database(database_path: str | Path, echo=False)
Creates empty database in which important information from a PointsDirectory instance can be stored.
- Parameters:
database_path – A string or Path to a (non-existing) database on disk.
- create_database_session(database_path: Path, echo=False)
Creates a sqlalchemy Engine object as well as a Session object which are used to interact with the SQLite database.
- Parameters:
database_path – pathlib.Path object to database
echo – Whether for SQLAlchemy to echo SQL commands used, defaults to False
- get_sqlite_db_information(db_path: str | Path, echo=False) Tuple[List[str], List[str], DataFrame]
Gets relevant information from database needed to post process data and generate datasets for machine learning
- Parameters:
db_path – Path to SQLite3 database containing Points, AtomNames, and Dataset tables.
echo – Whether to echo executed SQL statements, defaults to False
- Returns:
Tuple of: List of point ids (integers) contained in the db, List of atom names (str) contained in db, a pd.DataFrame object containing all relevant data needed to construct the datasets.
- Return type:
Tuple[List[str], List[str], pd.DataFrame]