ichor.core.analysis.s_curves package

Submodules

ichor.core.analysis.s_curves.compact_s_curves module

calculate_compact_s_curves(model_location: Path, validation_set_location: Path, output_location: Path, atoms: List[str] | None = None, types: List[str] | None = None, **kwargs)

Calculates S-curves used to check model prediction performance. Writes the S-curves to an excel file.

Parameters:
  • model_location – A directory containing model files .model

  • validation_set_location – A directory containing validation or test set points. These points should NOT be in the training set.

  • atoms – A list of atom names, eg. O1, H2, C3, etc. for which to make S-curves. S-curves are made for all atoms in the system by default.

  • types – A list of property types, such as iqa, q00, etc. for which to make S-curves. S-curves are made for all properties in the model files.

  • kwargs – Any key word arguments that can be passed into the write_to_excel function to change how the S-curves excel file looks. See write_to_excel() method

calculate_compact_s_curves_from_files(csv_files_list: List[Path | str], models: Models, output_location: str | Path = 's_curves_from_df.xlsx', property_names: List[str] | None = None, **kwargs)

Calculates S-curves used to check model prediction performance.

Parameters:
  • csv_files_list – A list of .csv files that contain features columns and property columns.

  • models – A Models instance which contains model files

  • output_location – The name of the .xlsx file where to save the s-curves.

  • property_names – A list of strings to use for property column names. If left as None, a default set of property names is used

  • kwargs – Key word argument to give to xlsxwriter for customizing plots.

calculate_compact_s_curves_from_true_predicted(predicted_values_dict: Dict[str, Dict[str, ndarray]], true_values_dict: Dict[str, Dict[str, ndarray]], output_location: str | Path = 's_curves_from_df.xlsx', **kwargs)

Make s-curves from dictionary of predicted values and dictionary of true values

Parameters:
  • predicted_values_dict – A dict of key: atom_name val inner_dict. inner_dict of key: property_name, values: 1D np.ndarray containing predicted data for all points

  • true_values_dict – A dict of key: atom_name val inner_dict. inner_dict of key: property_name, values: 1D np.ndarray containing true data for all points

  • output_location – The name of the output .xlsx file, defaults to “s_curves_from_df.xlsx”

make_chart_settings(local_kwargs: dict)

Takes in a dictionary of key word arguments that were passed into the write_to_excel function. Then, this function constructs dictionaries with parameter values to be passed to xlsx writer to configure graph settings.

Parameters:

local_kwargs – A dictionary containing key word arguments that are parsed to construct the xlsx-writer graph settings

mpl_get_true_vals_dict(predicted_values_dict: Dict[str, Dict[str, ndarray]], true_values_dict: Dict[str, Dict[str, ndarray]]) dict

Make s-curves from dictionary of predicted values and dictionary of true values

Parameters:
  • predicted_values_dict – A dict of key: atom_name val inner_dict. inner_dict of key: property_name, values: 1D np.ndarray containing predicted data for all points

  • true_values_dict – A dict of key: atom_name val inner_dict. inner_dict of key: property_name, values: 1D np.ndarray containing true data for all points

  • output_location – The name of the output .xlsx file, defaults to “s_curves_from_df.xlsx”

percentile(n: int) ndarray
plot_with_matplotlib(total_dict: List[dict] | dict, x_axis_name: str = 'Prediction Error / kJ mol$^{-1}$', y_axis_name: str = '%', title: str | None = None, saved_name: str = 's_curves.svg')
plot_with_matplotlib_simple(total_dict: dict, x_axis_name: str = 'Prediction Error / kJ mol$^{-1}$', y_axis_name: str = '%', title: str | None = None)
simplified_write_to_excel(total_dict: Dict[str, Dict[str, Dict[str, ndarray]]], output_name: Path = 's-curves.xlsx', x_axis_name: str = 'Absolute Prediction Error', x_log_scale: bool = True, x_major_gridlines_visible: bool = True, x_minor_gridlines_visible: bool = True, x_axis_major_gridline_width: int = 0.75, x_axis_major_gridline_color: str = '#F2F2F2', y_axis_name: str = '%', y_min: int = 0, y_max: int = 100, y_major_gridlines_visible: bool = True, y_minor_gridlines_visible: bool = False, y_axis_major_gridline_width: int = 0.75, y_axis_major_gridline_color: str = '#BFBFBF', show_legend: bool = False, excel_style: int = 10, sort_keys: bool = True)

Writes out relevant information which is used to make s-curves to an excel file. It will make a separate sheet for every atom (and property). It also makes a Total sheet for every property, which gives an idea how the predictions do overall for the whole system.

Parameters:
  • total_dict – a dictionary containing key: property, val: inner_dict. inner_dict contains key: atom_name, val: inner_inner_dict. inner_inner_dict contains key: (true, predicted or error), val: a 1D numpy array containing the corresponding values

  • output_name – The name of the excel file to be written out.

  • x_axis_name – The title to be used for x-axis in the S-curves plot.

  • x_log_scale – Whether to make x dimension log scaled. Default True.

  • x_major_gridlines_visible – Whether to show major gridlines along x. Default True.

  • x_minor_gridlines_visible – Whether to show minor gridlines along x. Default True.

  • x_axis_major_gridline_width – The width to use for the major gridlines. Default is 0.75.

  • x_axis_major_gridline_color – Color to use for gridlines. Default is “#F2F2F2”.

  • y_axis_name – The title to be used for the y-axis in the S-curves plot.

  • y_min – The minimum percentage value to show.

  • y_max – The maximum percentage value to show.

  • y_major_gridlines_visible – Whether to show major gridlines along y. Default True.

  • y_minor_gridlines_visible – Whether to show minor gridlines along y. Default False.

  • y_axis_major_gridline_width – The width to use for the major gridlines. Default is 0.75.

  • y_axis_major_gridline_color – Color to use for gridlines. Default is “#BFBFBF”.

  • show_legend – Whether to show legend on the plot. Default False.

  • excel_style – The style which excel uses for the plots. Default is 10, which is the default style used by excel.

  • sort_columns – Whether to sort the keys of the dictionary (uses Python sort). Default True.

write_to_excel(true: DataFrame, predicted: DataFrame, output_name: Path = 's-curves.xlsx', x_axis_name: str = 'Absolute Prediction Error', x_log_scale: bool = True, x_major_gridlines_visible: bool = True, x_minor_gridlines_visible: bool = True, x_axis_major_gridline_width: int = 0.75, x_axis_major_gridline_color: str = '#F2F2F2', y_axis_name: str = '%', y_min: int = 0, y_max: int = 100, y_major_gridlines_visible: bool = True, y_minor_gridlines_visible: bool = False, y_axis_major_gridline_width: int = 0.75, y_axis_major_gridline_color: str = '#BFBFBF', show_legend: bool = False, excel_style: int = 10)

Writes out relevant information which is used to make s-curves to an excel file. It will make a separate sheet for every atom (and property). It also makes a Total sheet for every property, which gives an idea how the predictions do overall for the whole system.

Parameters:
  • true – a ModelsResult containing true values (as caluclated by AIMALL) for the validation/test set

  • predicted – a ModelsResult containing predicted values, given the validation/test set features

  • output_name – The name of the excel file to be written out.

  • x_axis_name – The title to be used for x-axis in the S-curves plot.

  • x_log_scale – Whether to make x dimension log scaled. Default True.

  • x_major_gridlines_visible – Whether to show major gridlines along x. Default True.

  • x_minor_gridlines_visible – Whether to show minor gridlines along x. Default True.

  • x_axis_major_gridline_width – The width to use for the major gridlines. Default is 0.75.

  • x_axis_major_gridline_color – Color to use for gridlines. Default is “#F2F2F2”.

  • y_axis_name – The title to be used for the y-axis in the S-curves plot.

  • y_min – The minimum percentage value to show.

  • y_max – The maximum percentage value to show.

  • y_major_gridlines_visible – Whether to show major gridlines along y. Default True.

  • y_minor_gridlines_visible – Whether to show minor gridlines along y. Default False.

  • y_axis_major_gridline_width – The width to use for the major gridlines. Default is 0.75.

  • y_axis_major_gridline_color – Color to use for gridlines. Default is “#BFBFBF”.

  • show_legend – Whether to show legend on the plot. Default False.

  • excel_style – The style which excel uses for the plots. Default is 10, which is the default style used by excel.

ichor.core.analysis.s_curves.s_curves module

calculate_s_curves(model_location: Path, validation_set_location: Path, output_location: Path, atoms: List[str] | None = None, types: List[str] | None = None, **kwargs)

Calculates S-curves used to check model prediction performance. Writes the S-curves to an excel file.

Parameters:
  • model_location – A directory containing model files .model

  • validation_set_location – A directory containing validation or test set points. These points should NOT be in the training set.

  • atoms – A list of atom names, eg. O1, H2, C3, etc. for which to make S-curves. S-curves are made for all atoms in the system by default.

  • types – A list of property types, such as iqa, q00, etc. for which to make S-curves. S-curves are made for all properties in the model files.

  • kwargs – Any key word arguments that can be passed into the write_to_excel function to change how the S-curves excel file looks. See write_to_excel() method

make_chart_settings(local_kwargs: dict)

Takes in a dictionary of key word arguments that were passed into the write_to_excel function. Then, this function constructs dictionaries with parameter values to be passed to xlsx writer to configure graph settings.

Parameters:

local_kwargs – A dictionary containing key word arguments that are parsed to construct the xlsx-writer graph settings

percentile(n: int) ndarray
write_to_excel(true: DataFrame, predicted: DataFrame, output_name: Path = 's-curves.xlsx', x_axis_name: str = 'Absolute Prediction Error', x_log_scale: bool = True, x_major_gridlines_visible: bool = True, x_minor_gridlines_visible: bool = True, x_axis_major_gridline_width: int = 0.75, x_axis_major_gridline_color: str = '#F2F2F2', y_axis_name: str = '%', y_min: int = 0, y_max: int = 100, y_major_gridlines_visible: bool = True, y_minor_gridlines_visible: bool = False, y_axis_major_gridline_width: int = 0.75, y_axis_major_gridline_color: str = '#BFBFBF', show_legend: bool = False, excel_style: int = 10)

Writes out relevant information which is used to make s-curves to an excel file. It will make a separate sheet for every atom (and property). It also makes a Total sheet for every property, which gives an idea how the predictions do overall for the whole system.

Parameters:
  • true – a ModelsResult containing true values (as caluclated by AIMALL) for the validation/test set

  • predicted – a ModelsResult containing predicted values, given the validation/test set features

  • output_name – The name of the excel file to be written out.

  • x_axis_name – The title to be used for x-axis in the S-curves plot.

  • x_log_scale – Whether to make x dimension log scaled. Default True.

  • x_major_gridlines_visible – Whether to show major gridlines along x. Default True.

  • x_minor_gridlines_visible – Whether to show minor gridlines along x. Default True.

  • x_axis_major_gridline_width – The width to use for the major gridlines. Default is 0.75.

  • x_axis_major_gridline_color – Color to use for gridlines. Default is “#F2F2F2”.

  • y_axis_name – The title to be used for the y-axis in the S-curves plot.

  • y_min – The minimum percentage value to show.

  • y_max – The maximum percentage value to show.

  • y_major_gridlines_visible – Whether to show major gridlines along y. Default True.

  • y_minor_gridlines_visible – Whether to show minor gridlines along y. Default False.

  • y_axis_major_gridline_width – The width to use for the major gridlines. Default is 0.75.

  • y_axis_major_gridline_color – Color to use for gridlines. Default is “#BFBFBF”.

  • show_legend – Whether to show legend on the plot. Default False.

  • excel_style – The style which excel uses for the plots. Default is 10, which is the default style used by excel.

Module contents

calculate_compact_s_curves(model_location: Path, validation_set_location: Path, output_location: Path, atoms: List[str] | None = None, types: List[str] | None = None, **kwargs)

Calculates S-curves used to check model prediction performance. Writes the S-curves to an excel file.

Parameters:
  • model_location – A directory containing model files .model

  • validation_set_location – A directory containing validation or test set points. These points should NOT be in the training set.

  • atoms – A list of atom names, eg. O1, H2, C3, etc. for which to make S-curves. S-curves are made for all atoms in the system by default.

  • types – A list of property types, such as iqa, q00, etc. for which to make S-curves. S-curves are made for all properties in the model files.

  • kwargs – Any key word arguments that can be passed into the write_to_excel function to change how the S-curves excel file looks. See write_to_excel() method

calculate_s_curves(model_location: Path, validation_set_location: Path, output_location: Path, atoms: List[str] | None = None, types: List[str] | None = None, **kwargs)

Calculates S-curves used to check model prediction performance. Writes the S-curves to an excel file.

Parameters:
  • model_location – A directory containing model files .model

  • validation_set_location – A directory containing validation or test set points. These points should NOT be in the training set.

  • atoms – A list of atom names, eg. O1, H2, C3, etc. for which to make S-curves. S-curves are made for all atoms in the system by default.

  • types – A list of property types, such as iqa, q00, etc. for which to make S-curves. S-curves are made for all properties in the model files.

  • kwargs – Any key word arguments that can be passed into the write_to_excel function to change how the S-curves excel file looks. See write_to_excel() method