{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Calculating Features\n", "\n", "## `Atom` and `Atoms` class\n", "\n", "Calculating features is done through the `Atoms` class, which contains one molecular geometry. Each atom inside an `Atoms` instance is an `Atom` instance. These two classes are central to calculations relating to molecular geometries. Files containing a molecular geometry should have the `.atoms` attribute." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "O1 0.00000000 0.00000000 0.00000000\n", "H2 2.10000000 2.30000000 0.00000000\n", "H3 5.00000000 0.00000000 4.30000000\n" ] } ], "source": [ "from ichor.core.atoms import Atoms, Atom\n", "\n", "# make some random geometry\n", "atoms_instance = Atoms([Atom(\"O\", 0.0, 0.0, 0.0),\n", " Atom(\"H\", 2.1, 2.3, 0.0),\n", " Atom(\"H\", 5.0, 0.0, 4.3)])\n", "\n", "print(atoms_instance)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Calculators\n", "\n", "Calculator functions take in an `Atom` or `Atoms` instance and calculate something with it. An example below calculated the atomic local frame (ALF) features.\n", "\n", "Features are calculated using feature calculator functions, which take in an `Atom` instance and return the calculated features. The calculators might need extra positional or keyword arguments. Currently, only an atomic local frame calculator is implemented directly in `ichor`, but there is nothing stopping a user from implementing their own calculators as needed. Below are examples of using feature calculator functions and writing custom ones." ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'H2': ALF(origin_idx=1, x_axis_idx=0, xy_plane_idx=2),\n", " 'H3': ALF(origin_idx=2, x_axis_idx=0, xy_plane_idx=1),\n", " 'O1': ALF(origin_idx=0, x_axis_idx=1, xy_plane_idx=2)}\n", "[ALF(origin_idx=0, x_axis_idx=1, xy_plane_idx=2),\n", " ALF(origin_idx=1, x_axis_idx=0, xy_plane_idx=2),\n", " ALF(origin_idx=2, x_axis_idx=0, xy_plane_idx=1)]\n" ] } ], "source": [ "from ichor.core.calculators import default_alf_calculator, default_feature_calculator\n", "from pprint import pprint\n", "\n", "# for the alf features, we need to define an ALF first\n", "alf_dict = atoms_instance.alf_dict(default_alf_calculator)\n", "pprint(alf_dict)\n", "\n", "alf_list = atoms_instance.alf_list(default_alf_calculator)\n", "pprint(alf_list)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "As seen above, we first calculate the ALF for every atom. There is the `alf_list` and `alf_dict` methods, which return either a list of a dictionary respectively. In both cases, the `ALF` class is used to contain information about the atom's ALF. This `ALF` class is simply a named tuple containing the origin index, x axis index, and xy plane index. Note that these are 0-indexed as Python is 0 indexed. However, do note that the actual atom names (eg. O1, H2, etc.) do start at 1. There is no atom `H0` for example." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "array([[ 5.88551857, 12.46216712, 1.0341914 ],\n", " [ 5.88551857, 10.72159395, 1.61608526],\n", " [12.46216712, 10.72159395, 0.49131599]])\n", "array([[ 5.88551857, 12.46216712, 1.0341914 ],\n", " [ 5.88551857, 10.72159395, 1.61608526],\n", " [12.46216712, 10.72159395, 0.49131599]])\n" ] } ], "source": [ "# calculating features with given alf\n", "\n", "features = atoms_instance.features(default_feature_calculator, alf_dict)\n", "pprint(features)\n", "\n", "features = atoms_instance.features(default_feature_calculator, alf_list)\n", "pprint(features)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "The ALF features for the given `Atoms` instance are calculated as a numpy array of shape `natoms x nfeatures`. Since these features are per-atom, each atom has its own feature set. Now, we can also calculate the features for an individual atom like so" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "array([ 5.88551857, 12.46216712, 1.0341914 ])\n" ] } ], "source": [ "one_atom_features = atoms_instance[\"O1\"].features(default_feature_calculator, alf_dict)\n", "pprint(one_atom_features)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "This is done by directly calculating the ALF features only for the O1 atom." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Defining a custom feature calculator\n", "\n", "Let's say that you want to implement a custom calculator. All you need to do is implement a function that takes in an `Atom` or `Atoms` instance and does something with it. If an `Atom` is passed in, then the function should compute separate features for each atom. If an `Atoms` instance is passed in, then the function should compute one set of features for the whole system. Below are two examples of defining custom feature calculator functions.\n", "\n", "### Computing separate feature vectors for each atom" ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0. 2.1 5. ]\n", "[0. 2.3 0. ]\n" ] } ], "source": [ "import numpy as np\n", "\n", "def x_coordinate_features(atom_instance: Atom):\n", " \"\"\"A calculator function that grabs only the x-coordinate if the atom is an oxygen\n", " or grabs the y-coordinate for any other atom type.\n", "\n", " :param atom_instance: an Atom instance to work with\n", " :return: An array containing the relevant coordinate for each atom\n", " \"\"\"\n", " if atom_instance.type == \"O\":\n", " return np.array([a.coordinates[0] for a in atom_instance.parent])\n", " return np.array([a.coordinates[1] for a in atom_instance.parent])\n", "\n", "new_set_of_features_o1 = atoms_instance[\"O1\"].features(x_coordinate_features)\n", "new_set_of_features_h2 = atoms_instance[\"H2\"].features(x_coordinate_features)\n", "\n", "print(new_set_of_features_o1)\n", "print(new_set_of_features_h2)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "To get the features for all atoms at once, the `Atoms` class is used instead, which just gets the features for all `Atom` instances held inside the `Atoms` class. Of course, much more complicated feature calculators can be built." ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0. 2.1 5. ]\n", " [0. 2.3 0. ]\n", " [0. 2.3 0. ]]\n" ] } ], "source": [ "new_set_of_features_all_atoms = atoms_instance.features(x_coordinate_features)\n", "\n", "print(new_set_of_features_all_atoms)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Coulomb Matrix Example - One feature vector for whole system\n", "\n", "Below is an example of how to use ichor to compute the coulomb matrix representation of molecules. Note that the `is_atomic=False` is used to indicate that the calculator is going to be obtaining a feature vector used to describe the whole system. In this case, each atom does not have its own set of features." ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[73.51669472, 9.75609756, 8.07915961],\n", " [ 9.75609756, 0.5 , 0.62764116],\n", " [ 8.07915961, 0.62764116, 0.5 ]])" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# define calculator function which is going to compute a coulomb matrix for a given geometry\n", "\n", "from ichor.core.common.constants import type2nuclear_charge\n", "from ichor.core.atoms import Atom, Atoms\n", "\n", "def coulomb_matrix_representation(atoms: Atoms):\n", "\n", " natoms = len(atoms)\n", " res = np.empty((natoms, natoms))\n", " for i, atm1 in enumerate(atoms):\n", " for j, atm2 in enumerate(atoms):\n", " if i == j:\n", " res[i, j] = 0.5 * type2nuclear_charge[atm1.type]**2.4\n", " else:\n", " dist = np.linalg.norm(atm2.coordinates - atm1.coordinates)\n", " res[i, j] = (type2nuclear_charge[atm1.type] * type2nuclear_charge[atm2.type]) / dist\n", "\n", " return res\n", "\n", "\n", "atoms = Atoms([Atom(\"O\", 0.0, 0.0, 0.0), Atom(\"H\", 0.82, 0.0, 0.0), Atom(\"H\", -0.54, 0.83, 0.0)])\n", "coulomb_matrix_descriptors = atoms.features(coulomb_matrix_representation, is_atomic=False)\n", "\n", "coulomb_matrix_descriptors" ] } ], "metadata": { "kernelspec": { "display_name": "ichor_docs", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }