{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# `PointsDirectory` - A class used to encapsulate all calculations for many geometries (of a dataset)\n", "\n", "The `ichor.core.files.PointsDirectory` class can be used to easily work with thousands of files which are generated when getting Gaussian, AIMAll, etc. calculations for many geometries.\n", "\n", "The general structure of a `PointsDirectory`-like directory is like so:\n", "\n", "```\n", ".\n", "|--- SYSTEM0001.pointdir\n", "|   |--- SYSTEM0001_atomicfiles\n", "|   |   |--- h2.int\n", "|   |   |--- h3.int\n", "|   |   |--- o1.int\n", "|   |--- SYSTEM0001.gjf\n", "|   |--- SYSTEM0001.wfn\n", "|--- SYSTEM0002.pointdir\n", "|   |--- SYSTEM0002_atomicfiles\n", "|   |   |--- h2.int\n", "|   |   |--- h3.int\n", "|   |   |--- o1.int\n", "|   |--- SYSTEM0002.gjf\n", "|   |--- SYSTEM0002.wfn\n", "...\n", "...\n", "...\n", "```\n", "\n", "Essentially, the `PointsDirectory` is a classed that is used to parse a directory contains many sub-directories (which are instances of `PointDirectory`). Each sub-directory (e.g. `SYSTEM0001.pointdir`, `SYSTEM0002.pointdir`) contains all relevant calculations for **one** molecular geometry. Each of the *sub-directories* can be individually read in as a `ichor.core.files.PointDirectory` instance (note that there is no *s* in this case.)\n", "\n", "This class makes it easy to access calculations for many geometries very easily." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## `PointDirectory` strucutre\n", "\n", "The `PointDirectory` class encapsulates a directory, containing all relevant calculations for **one** geometry. It subclasses from `ichor.core.files.directory.AnnotatedDirectory`. This gives us the ability to define *class* variables, which are of specific file types. Then the `AnnotatedDirectory._parse` method is what parses all files in the directory. The extensions of the files determine what the file type, and thus the class which is going to be used to parse the file.\n", "\n", "The `PointDirectory.contents` class variable can be overwritten to quickly add support for new file and directory file types. This ensures that any new file or directory types in ichor are ready to be used with `PointDirectory`. The `contents` variable is a Python dictionary containing keys which are going to available as attributes after parsing, and values containing the Python class which is going to parse the relevant file or directory. For example, this is the current `contents` variable:\n", "\n", "```python\n", "contents = {\n", " \"xyz\": XYZ,\n", " \"gjf\": GJF,\n", " \"gaussian_output\": GaussianOutput,\n", " \"orca_input\": OrcaInput,\n", " \"orca_output\": OrcaOutput,\n", " \"aim\": Aim,\n", " \"wfn\": WFN,\n", " \"ints\": IntDirectory,\n", "}\n", "```\n", "\n", "where `XYZ` is the class that is going to read a `.xyz` file in the directory.\n", "\n" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Obtaining results from a `PointsDirectory`" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Obtaining total system energy\n", "\n", "The following code snippet can be used to quickly get the total system energy from a Gaussian calculation for example" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WATER_MONOMER0000.pointdir -76.421710687455\n", "WATER_MONOMER0001.pointdir -76.429947804\n", "WATER_MONOMER0002.pointdir -76.430599107417\n", "WATER_MONOMER0003.pointdir -76.42948849797\n" ] } ], "source": [ "from ichor.core.files import PointsDirectory\n", "\n", "# PointsDirectory(\"path_to_directory_with_wfn_and_int_files\")\n", "points_dir = PointsDirectory(\"../../../example_files/example_points_directory/WATER_MONOMER.pointsdir\")\n", "\n", "for point_directory in points_dir:\n", "\n", " print(point_directory.name, point_directory.wfn.total_energy)" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Accessing IQA energy for a specific atom" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WATER_MONOMER0000.pointdir -75.446714709\n", "WATER_MONOMER0001.pointdir -75.453164031\n", "WATER_MONOMER0002.pointdir -75.453749708\n", "WATER_MONOMER0003.pointdir -75.453284702\n" ] } ], "source": [ "for point_directory in points_dir:\n", "\n", " print(point_directory.name, point_directory.ints[\"O1\"].iqa)\n", "\n", "# note that this is for A A'" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "### Accessing Mulipole Moments" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "WATER_MONOMER0000.pointdir {'q00': -1.051921199, 'q10': -0.020042356378, 'q11c': 0.0018273275449, 'q11s': -0.20706556929, 'q20': -0.037266216811, 'q21c': -0.79780613831, 'q21s': 0.013146921148, 'q22c': -0.19595266005, 'q22s': 0.078488472227, 'q30': 0.043015515207, 'q31c': -0.053621704828, 'q31s': 0.21644282193, 'q32c': -0.029607236961, 'q32s': -0.89197505111, 'q33c': -0.053969314597, 'q33s': 0.16211677693, 'q40': -1.4545843935, 'q41c': 0.91783517331, 'q41s': 0.17650015949, 'q42c': -0.73112185714, 'q42s': -0.3293114897, 'q43c': 2.8344280941, 'q43s': -0.16267842746, 'q44c': -1.3853362266, 'q44s': 0.089771195512, 'q50': -0.24411738335, 'q51c': 0.48960856702, 'q51s': -1.5472642317, 'q52c': -0.040094542612, 'q52s': 0.98097072569, 'q53c': 0.72718022845, 'q53s': -1.1988409017, 'q54c': -0.47766441277, 'q54s': 2.0753064137, 'q55c': -0.29405113415, 'q55s': -1.6430303594}\n", "WATER_MONOMER0001.pointdir {'q00': -1.1248310833, 'q10': -0.15773618224, 'q11c': 0.081543820356, 'q11s': 0.12130191092, 'q20': -0.12606180318, 'q21c': 0.19139555969, 'q21s': -0.528487624, 'q22c': 0.54627966503, 'q22s': -0.078510757285, 'q30': -0.27889785637, 'q31c': 0.49213687845, 'q31s': -0.3074017347, 'q32c': 0.14764509944, 'q32s': -0.24727703379, 'q33c': -0.44944367966, 'q33s': -0.35435058603, 'q40': -0.48581728297, 'q41c': 1.5494543387, 'q41s': -0.23351555403, 'q42c': -0.86875233643, 'q42s': -1.853314432, 'q43c': -1.8063689406, 'q43s': -1.1998626773, 'q44c': 0.48732628951, 'q44s': 1.3828550511, 'q50': 0.4495412122, 'q51c': 0.31877779684, 'q51s': -0.024332972891, 'q52c': -1.476162947, 'q52s': -2.119742167, 'q53c': -1.5055751323, 'q53s': 0.18416072712, 'q54c': 0.45591672771, 'q54s': 2.1945711652, 'q55c': 0.03684630334, 'q55s': -1.3214124663}\n", "WATER_MONOMER0002.pointdir {'q00': -1.1395301182, 'q10': 0.1170316343, 'q11c': 0.12707113118, 'q11s': 0.14319062627, 'q20': 0.035370307734, 'q21c': -0.42415679932, 'q21s': 0.49791338765, 'q22c': 0.37603023988, 'q22s': 0.20445186908, 'q30': -0.15737553444, 'q31c': 0.45818443084, 'q31s': -0.35883958802, 'q32c': 0.3480880684, 'q32s': 0.20971032944, 'q33c': -0.29423125466, 'q33s': -0.34625176326, 'q40': 0.41096573978, 'q41c': 0.09307943713, 'q41s': 1.5410113059, 'q42c': -2.179742611, 'q42s': -1.5590824961, 'q43c': 0.35883654798, 'q43s': -0.88871603233, 'q44c': 0.43934126428, 'q44s': 1.7461938373, 'q50': 1.0996389779, 'q51c': -1.262699744, 'q51s': -1.3852909494, 'q52c': 1.560792331, 'q52s': 0.24419844412, 'q53c': -0.37471724152, 'q53s': 2.5132383079, 'q54c': -0.95964885508, 'q54s': -0.26895872429, 'q55c': 0.55394494551, 'q55s': -1.8680975711}\n", "WATER_MONOMER0003.pointdir {'q00': -1.1261722464, 'q10': 0.11200768361, 'q11c': 0.19713726623, 'q11s': -0.046368083336, 'q20': 0.25960435715, 'q21c': -0.159889789, 'q21s': -0.51517986412, 'q22c': 0.49258123822, 'q22s': 0.087179927499, 'q30': -0.4214583255, 'q31c': 0.024210442018, 'q31s': 0.44866570921, 'q32c': 0.31042203091, 'q32s': 0.23745156258, 'q33c': -0.37812439877, 'q33s': -0.13095025897, 'q40': 0.88553589917, 'q41c': 1.9512466748, 'q41s': -1.4646293031, 'q42c': -1.3890087525, 'q42s': -1.1724735472, 'q43c': -1.3146772524, 'q43s': -0.23121743344, 'q44c': 1.278074295, 'q44s': 0.73371421822, 'q50': 1.1218634504, 'q51c': -2.2221116073, 'q51s': 1.0854260589, 'q52c': -0.70009846634, 'q52s': 1.5789483953, 'q53c': 2.0156858127, 'q53s': 0.35708799616, 'q54c': 1.2709617017, 'q54s': -0.46441127444, 'q55c': -1.2724101051, 'q55s': -0.93923967921}\n" ] } ], "source": [ "for point_directory in points_dir:\n", "\n", " print(point_directory.name, point_directory.ints[\"O1\"].global_spherical_multipoles)\n", "\n", "# note these are not rotated" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Accessing all data from all files\n", "\n", "There is a very quick way to obtain all raw data from all calculations in a `PointsDirectory`. The `raw_data` property can be used to obtain the raw data. This returns a Python dictionary where the keys are the point names and the values are a nested Python dictionary containing the results from all the relevant calculations." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'WATER_MONOMER0000': {'gaussian_output': {'global_forces': {'O1': array([ 0.02953315, 0.0827204 , -0.02495305]),\n", " 'H2': array([ 0.00578961, -0.0242831 , -0.00842433]),\n", " 'H3': array([-0.03532276, -0.05843731, 0.03337739])},\n", " 'charge': 0,\n", " 'multiplicity': 1,\n", " 'molecular_dipole': MolecularDipole(x=0.1189, y=2.3866, z=0.0787),\n", " 'molecular_quadrupole': MolecularQuadrupole(xx=-6.5273, yy=-7.7674, zz=-6.2577, xy=0.0665, xz=-1.6318, yz=-0.0495),\n", " 'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=0.3235, yy=-0.9166, zz=0.5931, xy=0.0665, xz=-1.6318, yz=-0.0495),\n", " 'molecular_octupole': MolecularOctupole(xxx=0.5348, yyy=8.5805, zzz=0.2229, xyy=0.1794, xxy=3.083, xxz=0.0727, xzz=0.1817, yzz=3.0764, yyz=0.0143, xyz=0.0059),\n", " 'molecular_hexadecapole': MolecularHexadecapole(xxxx=-8.283, yyyy=-15.2596, zzzz=-8.1733, xxxy=-0.2961, xxxz=-0.1375, yyyx=-0.296, yyyz=0.056, zzzx=-0.1975, zzzy=0.0334, xxyy=-3.888, xxzz=-2.4818, yyzz=-3.8474, xxyz=-0.0404, yyxz=-0.2009, zzxy=-0.0859)},\n", " 'wfn': {'energy': -76.421710687455, 'virial_ratio': 2.01177209},\n", " 'ints': {'H2': {'iqa': -0.48880091691,\n", " 'integration_error': 1.8741005824e-05,\n", " 'q00': 0.55107276527,\n", " 'q10': -0.10046776424,\n", " 'q11c': 0.082404094273,\n", " 'q11s': -0.12293368007,\n", " 'q20': 0.0036460292977,\n", " 'q21c': 0.00029619225829,\n", " 'q21s': -0.0074877731368,\n", " 'q22c': 0.0074481701736,\n", " 'q22s': 0.0056874730269,\n", " 'q30': -0.05956450163,\n", " 'q31c': -0.039733101597,\n", " 'q31s': 0.041495598451,\n", " 'q32c': -0.028165432797,\n", " 'q32s': -0.11949807302,\n", " 'q33c': 0.058162645861,\n", " 'q33s': 0.018335018802,\n", " 'q40': -0.12002978326,\n", " 'q41c': 0.040444633273,\n", " 'q41s': -0.069505346729,\n", " 'q42c': -0.019427525486,\n", " 'q42s': -0.15727748843,\n", " 'q43c': 0.1932877549,\n", " 'q43s': 0.062934026281,\n", " 'q44c': -0.057209327496,\n", " 'q44s': 0.059969789986,\n", " 'q50': -0.011914817313,\n", " 'q51c': 0.062065103525,\n", " 'q51s': -0.076563819972,\n", " 'q52c': 0.048593580786,\n", " 'q52s': 0.026776257085,\n", " 'q53c': 0.072780829944,\n", " 'q53s': 0.028736420893,\n", " 'q54c': -0.04279361242,\n", " 'q54s': 0.10697549721,\n", " 'q55c': -0.031070190556,\n", " 'q55s': -0.026095070495},\n", " 'O1': {'iqa': -75.446714709,\n", " 'integration_error': -2.8725236559e-05,\n", " 'q00': -1.051921199,\n", " 'q10': -0.020042356378,\n", " 'q11c': 0.0018273275449,\n", " 'q11s': -0.20706556929,\n", " 'q20': -0.037266216811,\n", " 'q21c': -0.79780613831,\n", " 'q21s': 0.013146921148,\n", " 'q22c': -0.19595266005,\n", " 'q22s': 0.078488472227,\n", " 'q30': 0.043015515207,\n", " 'q31c': -0.053621704828,\n", " 'q31s': 0.21644282193,\n", " 'q32c': -0.029607236961,\n", " 'q32s': -0.89197505111,\n", " 'q33c': -0.053969314597,\n", " 'q33s': 0.16211677693,\n", " 'q40': -1.4545843935,\n", " 'q41c': 0.91783517331,\n", " 'q41s': 0.17650015949,\n", " 'q42c': -0.73112185714,\n", " 'q42s': -0.3293114897,\n", " 'q43c': 2.8344280941,\n", " 'q43s': -0.16267842746,\n", " 'q44c': -1.3853362266,\n", " 'q44s': 0.089771195512,\n", " 'q50': -0.24411738335,\n", " 'q51c': 0.48960856702,\n", " 'q51s': -1.5472642317,\n", " 'q52c': -0.040094542612,\n", " 'q52s': 0.98097072569,\n", " 'q53c': 0.72718022845,\n", " 'q53s': -1.1988409017,\n", " 'q54c': -0.47766441277,\n", " 'q54s': 2.0753064137,\n", " 'q55c': -0.29405113415,\n", " 'q55s': -1.6430303594},\n", " 'H3': {'iqa': -0.48619221042,\n", " 'integration_error': 1.7274446651e-05,\n", " 'q00': 0.50085300856,\n", " 'q10': 0.085197944712,\n", " 'q11c': -0.087841616442,\n", " 'q11s': -0.12029023455,\n", " 'q20': 0.00023993899792,\n", " 'q21c': -0.023661428009,\n", " 'q21s': -0.021391961736,\n", " 'q22c': -1.9131859436e-05,\n", " 'q22s': 0.02151432195,\n", " 'q30': 0.087599091989,\n", " 'q31c': 0.031324229347,\n", " 'q31s': 0.026427482381,\n", " 'q32c': 0.023804975755,\n", " 'q32s': -0.15674955727,\n", " 'q33c': -0.088059760394,\n", " 'q33s': 0.037288453803,\n", " 'q40': -0.07444947707,\n", " 'q41c': 0.05887992764,\n", " 'q41s': 0.081114651822,\n", " 'q42c': 0.0088380735081,\n", " 'q42s': 0.083371237428,\n", " 'q43c': 0.15535798009,\n", " 'q43s': -0.069909915593,\n", " 'q44c': -0.064026374739,\n", " 'q44s': -0.057177316644,\n", " 'q50': 0.0082484388953,\n", " 'q51c': 0.099441485367,\n", " 'q51s': 0.11360202978,\n", " 'q52c': -0.01832368283,\n", " 'q52s': -0.0010932889664,\n", " 'q53c': 0.15341606275,\n", " 'q53s': -0.10054868004,\n", " 'q54c': -0.17440062253,\n", " 'q54s': -0.0001046247894,\n", " 'q55c': 0.045251112263,\n", " 'q55s': 0.071037518757}}},\n", " 'WATER_MONOMER0001': {'gaussian_output': {'global_forces': {'O1': array([ 0.03848438, -0.02380376, 0.03412189]),\n", " 'H2': array([-0.03146119, 0.00036476, -0.00231774]),\n", " 'H3': array([-0.00702319, 0.02343899, -0.03180415])},\n", " 'charge': 0,\n", " 'multiplicity': 1,\n", " 'molecular_dipole': MolecularDipole(x=-0.8431, y=-1.323, z=1.7231),\n", " 'molecular_quadrupole': MolecularQuadrupole(xx=-5.1893, yy=-7.688, zz=-7.426, xy=-0.5634, xz=0.9119, yz=-0.3113),\n", " 'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=1.5784, yy=-0.9202, zz=-0.6582, xy=-0.5634, xz=0.9119, yz=-0.3113),\n", " 'molecular_octupole': MolecularOctupole(xxx=-2.948, yyy=-4.821, zzz=6.188, xyy=-0.9565, xxy=-1.5601, xxz=2.0371, xzz=-0.9166, yzz=-1.4649, yyz=2.004, xyz=-0.0833),\n", " 'molecular_hexadecapole': MolecularHexadecapole(xxxx=-7.6094, yyyy=-10.1144, zzzz=-11.4551, xxxy=-0.5969, xxxz=0.8455, yyyx=-0.6288, yyyz=1.3354, zzzx=0.9015, zzzy=1.1908, xxyy=-3.0317, xxzz=-3.1805, yyzz=-3.4563, xxyz=0.2984, yyxz=0.3128, zzxy=-0.2858)},\n", " 'wfn': {'energy': -76.429947804, 'virial_ratio': 2.00949207},\n", " 'ints': {'H2': {'iqa': -0.48887410664,\n", " 'integration_error': 9.5429453148e-06,\n", " 'q00': 0.57930929588,\n", " 'q10': -0.046102497364,\n", " 'q11c': 0.16558301986,\n", " 'q11s': 0.041594915198,\n", " 'q20': 0.010377317695,\n", " 'q21c': 0.018136684148,\n", " 'q21s': 0.006500049871,\n", " 'q22c': -0.02013508558,\n", " 'q22s': -0.015523513189,\n", " 'q30': -0.034744645637,\n", " 'q31c': 0.057390530241,\n", " 'q31s': 0.0065667570896,\n", " 'q32c': 0.039512071665,\n", " 'q32s': 0.020328858681,\n", " 'q33c': -0.076627866971,\n", " 'q33s': -0.050739679414,\n", " 'q40': 0.050244296949,\n", " 'q41c': 0.13100319212,\n", " 'q41s': 0.030144145474,\n", " 'q42c': -0.10131293908,\n", " 'q42s': -0.02287002106,\n", " 'q43c': -0.089114870694,\n", " 'q43s': -0.097196124547,\n", " 'q44c': 0.13556820311,\n", " 'q44s': 0.15270252132,\n", " 'q50': 0.099377074131,\n", " 'q51c': -0.0078247554538,\n", " 'q51s': 0.014794214546,\n", " 'q52c': -0.14821510785,\n", " 'q52s': -0.089844814373,\n", " 'q53c': 0.085049483662,\n", " 'q53s': 0.0066340841318,\n", " 'q54c': 0.055004363647,\n", " 'q54s': 0.19474995219,\n", " 'q55c': -0.090696179084,\n", " 'q55s': -0.19386856821},\n", " 'O1': {'iqa': -75.453164031,\n", " 'integration_error': -3.2412635167e-05,\n", " 'q00': -1.1248310833,\n", " 'q10': -0.15773618224,\n", " 'q11c': 0.081543820356,\n", " 'q11s': 0.12130191092,\n", " 'q20': -0.12606180318,\n", " 'q21c': 0.19139555969,\n", " 'q21s': -0.528487624,\n", " 'q22c': 0.54627966503,\n", " 'q22s': -0.078510757285,\n", " 'q30': -0.27889785637,\n", " 'q31c': 0.49213687845,\n", " 'q31s': -0.3074017347,\n", " 'q32c': 0.14764509944,\n", " 'q32s': -0.24727703379,\n", " 'q33c': -0.44944367966,\n", " 'q33s': -0.35435058603,\n", " 'q40': -0.48581728297,\n", " 'q41c': 1.5494543387,\n", " 'q41s': -0.23351555403,\n", " 'q42c': -0.86875233643,\n", " 'q42s': -1.853314432,\n", " 'q43c': -1.8063689406,\n", " 'q43s': -1.1998626773,\n", " 'q44c': 0.48732628951,\n", " 'q44s': 1.3828550511,\n", " 'q50': 0.4495412122,\n", " 'q51c': 0.31877779684,\n", " 'q51s': -0.024332972891,\n", " 'q52c': -1.476162947,\n", " 'q52s': -2.119742167,\n", " 'q53c': -1.5055751323,\n", " 'q53s': 0.18416072712,\n", " 'q54c': 0.45591672771,\n", " 'q54s': 2.1945711652,\n", " 'q55c': 0.03684630334,\n", " 'q55s': -1.3214124663},\n", " 'H3': {'iqa': -0.48791314597,\n", " 'integration_error': 1.767282812e-05,\n", " 'q00': 0.54551717948,\n", " 'q10': -0.12912268454,\n", " 'q11c': -0.073220513388,\n", " 'q11s': 0.093216336946,\n", " 'q20': -0.0041810639947,\n", " 'q21c': 0.0054837257387,\n", " 'q21s': 0.0035806223466,\n", " 'q22c': 0.0055876412653,\n", " 'q22s': -0.0037182632383,\n", " 'q30': -0.031536097089,\n", " 'q31c': 0.074389974368,\n", " 'q31s': -0.088388188785,\n", " 'q32c': -0.0070097113295,\n", " 'q32s': -0.1115211214,\n", " 'q33c': -0.033297232489,\n", " 'q33s': -0.032924675413,\n", " 'q40': -0.13030060961,\n", " 'q41c': 0.038148510124,\n", " 'q41s': -0.073690881766,\n", " 'q42c': -0.049562233205,\n", " 'q42s': -0.19895480526,\n", " 'q43c': -0.10771772308,\n", " 'q43s': -0.084626662835,\n", " 'q44c': -0.043351191843,\n", " 'q44s': 0.0023716485062,\n", " 'q50': -0.053242517161,\n", " 'q51c': 0.0053036087089,\n", " 'q51s': -0.008766321614,\n", " 'q52c': -0.098989030278,\n", " 'q52s': -0.043646219093,\n", " 'q53c': -0.058542351859,\n", " 'q53s': 0.030550952424,\n", " 'q54c': 0.01341571848,\n", " 'q54s': 0.014472605634,\n", " 'q55c': 0.020911770864,\n", " 'q55s': -0.019132623387}}},\n", " 'WATER_MONOMER0002': {'gaussian_output': {'global_forces': {'O1': array([-0.01182343, -0.00326165, 0.00243345]),\n", " 'H2': array([-0.00938744, 0.00801946, 0.01598048]),\n", " 'H3': array([ 0.02121088, -0.00475781, -0.01841394])},\n", " 'charge': 0,\n", " 'multiplicity': 1,\n", " 'molecular_dipole': MolecularDipole(x=-1.3324, y=-1.5029, z=-1.2291),\n", " 'molecular_quadrupole': MolecularQuadrupole(xx=-6.0559, yy=-7.6792, zz=-6.5966, xy=-0.2517, xz=-1.3516, yz=0.355),\n", " 'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=0.7214, yy=-0.9019, zz=0.1806, xy=-0.2517, xz=-1.3516, yz=0.355),\n", " 'molecular_octupole': MolecularOctupole(xxx=-4.7196, yyy=-5.3872, zzz=-4.3733, xyy=-1.5173, xxy=-1.7003, xxz=-1.4362, xzz=-1.5791, yzz=-1.6989, yyz=-1.3608, xyz=0.0836),\n", " 'molecular_hexadecapole': MolecularHexadecapole(xxxx=-9.1389, yyyy=-10.4563, zzzz=-9.1524, xxxy=-0.9871, xxxz=-1.2163, yyyx=-1.1784, yyyz=-0.9508, zzzx=-1.1644, zzzy=-0.7955, xxyy=-3.2977, xxzz=-3.0954, yyzz=-3.1475, xxyz=-0.3143, yyxz=-0.3329, zzxy=-0.4511)},\n", " 'wfn': {'energy': -76.430599107417, 'virial_ratio': 2.00850663},\n", " 'ints': {'H2': {'iqa': -0.4884303933,\n", " 'integration_error': 8.4456970171e-06,\n", " 'q00': 0.56704064421,\n", " 'q10': -0.016788810577,\n", " 'q11c': 0.16678532637,\n", " 'q11s': 0.05926427077,\n", " 'q20': 0.012378760753,\n", " 'q21c': -0.0011590886116,\n", " 'q21s': 0.0002814248579,\n", " 'q22c': -0.013873040688,\n", " 'q22s': -0.015668008236,\n", " 'q30': -0.026844098476,\n", " 'q31c': 0.070432957219,\n", " 'q31s': 0.018973897274,\n", " 'q32c': 0.035650388172,\n", " 'q32s': 0.015621862978,\n", " 'q33c': -0.062642395464,\n", " 'q33s': -0.086278367162,\n", " 'q40': 0.087362375529,\n", " 'q41c': 0.076220558505,\n", " 'q41s': 0.01615660549,\n", " 'q42c': -0.11754244046,\n", " 'q42s': -0.086770182195,\n", " 'q43c': -0.070653482872,\n", " 'q43s': -0.0463683192,\n", " 'q44c': 0.053438266326,\n", " 'q44s': 0.21580446061,\n", " 'q50': 0.026789945801,\n", " 'q51c': -0.086115785343,\n", " 'q51s': -0.037958011591,\n", " 'q52c': -0.039664935648,\n", " 'q52s': -0.013453543961,\n", " 'q53c': 0.053511966654,\n", " 'q53s': 0.13667812966,\n", " 'q54c': 0.074049196913,\n", " 'q54s': 0.03171448802,\n", " 'q55c': 0.053650772738,\n", " 'q55s': -0.19420111212},\n", " 'O1': {'iqa': -75.453749708,\n", " 'integration_error': -3.8823037245e-05,\n", " 'q00': -1.1395301182,\n", " 'q10': 0.1170316343,\n", " 'q11c': 0.12707113118,\n", " 'q11s': 0.14319062627,\n", " 'q20': 0.035370307734,\n", " 'q21c': -0.42415679932,\n", " 'q21s': 0.49791338765,\n", " 'q22c': 0.37603023988,\n", " 'q22s': 0.20445186908,\n", " 'q30': -0.15737553444,\n", " 'q31c': 0.45818443084,\n", " 'q31s': -0.35883958802,\n", " 'q32c': 0.3480880684,\n", " 'q32s': 0.20971032944,\n", " 'q33c': -0.29423125466,\n", " 'q33s': -0.34625176326,\n", " 'q40': 0.41096573978,\n", " 'q41c': 0.09307943713,\n", " 'q41s': 1.5410113059,\n", " 'q42c': -2.179742611,\n", " 'q42s': -1.5590824961,\n", " 'q43c': 0.35883654798,\n", " 'q43s': -0.88871603233,\n", " 'q44c': 0.43934126428,\n", " 'q44s': 1.7461938373,\n", " 'q50': 1.0996389779,\n", " 'q51c': -1.262699744,\n", " 'q51s': -1.3852909494,\n", " 'q52c': 1.560792331,\n", " 'q52s': 0.24419844412,\n", " 'q53c': -0.37471724152,\n", " 'q53s': 2.5132383079,\n", " 'q54c': -0.95964885508,\n", " 'q54s': -0.26895872429,\n", " 'q55c': 0.55394494551,\n", " 'q55s': -1.8680975711},\n", " 'H3': {'iqa': -0.48842559681,\n", " 'integration_error': 1.7432947248e-05,\n", " 'q00': 0.5724835067,\n", " 'q10': 0.1457603332,\n", " 'q11c': -0.028104713416,\n", " 'q11s': 0.09795249107,\n", " 'q20': -0.014236447809,\n", " 'q21c': 0.0020513538853,\n", " 'q21s': -0.024002553682,\n", " 'q22c': 0.0090167292986,\n", " 'q22s': 0.0018592788488,\n", " 'q30': -0.016278101055,\n", " 'q31c': 0.043378639139,\n", " 'q31s': -0.10104168631,\n", " 'q32c': 0.047377463605,\n", " 'q32s': 0.042668161752,\n", " 'q33c': -0.014828920829,\n", " 'q33s': 0.002301899411,\n", " 'q40': -0.070410778984,\n", " 'q41c': -0.074472064823,\n", " 'q41s': 0.18279123322,\n", " 'q42c': -0.1612131176,\n", " 'q42s': -0.12499918589,\n", " 'q43c': 0.06142026173,\n", " 'q43s': -0.037102021898,\n", " 'q44c': -0.0036973964853,\n", " 'q44s': 0.015807001231,\n", " 'q50': 0.16521543103,\n", " 'q51c': 0.025818622551,\n", " 'q51s': -0.070934786473,\n", " 'q52c': 0.20406825774,\n", " 'q52s': 0.12904411072,\n", " 'q53c': -0.064053836207,\n", " 'q53s': 0.092995258457,\n", " 'q54c': -0.019812495159,\n", " 'q54s': -0.014274832112,\n", " 'q55c': 0.013994317071,\n", " 'q55s': -0.0022039694548}}},\n", " 'WATER_MONOMER0003': {'gaussian_output': {'global_forces': {'O1': array([-0.01819327, 0.00455452, -0.01096517]),\n", " 'H2': array([-0.00188946, -0.00938049, 0.02132555]),\n", " 'H3': array([ 0.02008273, 0.00482597, -0.01036038])},\n", " 'charge': 0,\n", " 'multiplicity': 1,\n", " 'molecular_dipole': MolecularDipole(x=-2.0363, y=0.479, z=-1.157),\n", " 'molecular_quadrupole': MolecularQuadrupole(xx=-6.8842, yy=-7.4952, zz=-6.0587, xy=0.4805, xz=-1.0644, yz=-0.8024),\n", " 'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=-0.0715, yy=-0.6825, zz=0.754, xy=0.4805, xz=-1.0644, yz=-0.8024),\n", " 'molecular_octupole': MolecularOctupole(xxx=-7.1177, yyy=1.7805, zzz=-4.1497, xyy=-2.516, xxy=0.5301, xxz=-1.2734, xzz=-2.3914, yzz=0.5188, yyz=-1.4113, xyz=-0.0642),\n", " 'molecular_hexadecapole': MolecularHexadecapole(xxxx=-12.0314, yyyy=-8.1287, zzzz=-8.7818, xxxy=0.6728, xxxz=-1.5822, yyyx=0.6119, yyyz=0.2975, zzzx=-1.4365, zzzy=-0.0026, xxyy=-3.5051, xxzz=-3.5441, yyzz=-2.8023, xxyz=0.0089, yyxz=-0.4888, zzxy=0.1872)},\n", " 'wfn': {'energy': -76.42948849797, 'virial_ratio': 2.00883314},\n", " 'ints': {'H2': {'iqa': -0.4881027439,\n", " 'integration_error': 9.4522550832e-06,\n", " 'q00': 0.56326128376,\n", " 'q10': -0.038053718191,\n", " 'q11c': 0.17406452668,\n", " 'q11s': 0.019124181097,\n", " 'q20': 0.012081611559,\n", " 'q21c': 0.0051369484026,\n", " 'q21s': -0.00074088786746,\n", " 'q22c': -0.019407551417,\n", " 'q22s': -0.0027908684718,\n", " 'q30': -0.046075383748,\n", " 'q31c': 0.050014857932,\n", " 'q31s': 0.0094560487176,\n", " 'q32c': 0.058475130664,\n", " 'q32s': 0.024077433258,\n", " 'q33c': -0.097026056963,\n", " 'q33s': -0.03579136459,\n", " 'q40': 0.04154079856,\n", " 'q41c': 0.13187846251,\n", " 'q41s': 0.021412542719,\n", " 'q42c': -0.084391243749,\n", " 'q42s': -0.0200831628,\n", " 'q43c': -0.11848751779,\n", " 'q43s': -0.074415491028,\n", " 'q44c': 0.19465272929,\n", " 'q44s': 0.088507549312,\n", " 'q50': 0.058580091185,\n", " 'q51c': -0.042565949802,\n", " 'q51s': -0.0043775245542,\n", " 'q52c': -0.089474143135,\n", " 'q52s': -0.010408877679,\n", " 'q53c': 0.079998556562,\n", " 'q53s': 0.0045329002427,\n", " 'q54c': 0.07873109218,\n", " 'q54s': 0.091471346749,\n", " 'q55c': -0.17860562422,\n", " 'q55s': -0.077365814166},\n", " 'O1': {'iqa': -75.453284702,\n", " 'integration_error': -2.3962230701e-05,\n", " 'q00': -1.1261722464,\n", " 'q10': 0.11200768361,\n", " 'q11c': 0.19713726623,\n", " 'q11s': -0.046368083336,\n", " 'q20': 0.25960435715,\n", " 'q21c': -0.159889789,\n", " 'q21s': -0.51517986412,\n", " 'q22c': 0.49258123822,\n", " 'q22s': 0.087179927499,\n", " 'q30': -0.4214583255,\n", " 'q31c': 0.024210442018,\n", " 'q31s': 0.44866570921,\n", " 'q32c': 0.31042203091,\n", " 'q32s': 0.23745156258,\n", " 'q33c': -0.37812439877,\n", " 'q33s': -0.13095025897,\n", " 'q40': 0.88553589917,\n", " 'q41c': 1.9512466748,\n", " 'q41s': -1.4646293031,\n", " 'q42c': -1.3890087525,\n", " 'q42s': -1.1724735472,\n", " 'q43c': -1.3146772524,\n", " 'q43s': -0.23121743344,\n", " 'q44c': 1.278074295,\n", " 'q44s': 0.73371421822,\n", " 'q50': 1.1218634504,\n", " 'q51c': -2.2221116073,\n", " 'q51s': 1.0854260589,\n", " 'q52c': -0.70009846634,\n", " 'q52s': 1.5789483953,\n", " 'q53c': 2.0156858127,\n", " 'q53s': 0.35708799616,\n", " 'q54c': 1.2709617017,\n", " 'q54s': -0.46441127444,\n", " 'q55c': -1.2724101051,\n", " 'q55s': -0.93923967921},\n", " 'H3': {'iqa': -0.4880976683,\n", " 'integration_error': 1.85987186e-05,\n", " 'q00': 0.56290675093,\n", " 'q10': 0.16001384105,\n", " 'q11c': 0.04069948876,\n", " 'q11s': -0.069618583234,\n", " 'q20': -0.014716885498,\n", " 'q21c': -0.011767701802,\n", " 'q21s': 0.012518520553,\n", " 'q22c': 0.0019972173909,\n", " 'q22s': 0.0051377000688,\n", " 'q30': -0.065290261963,\n", " 'q31c': -0.043467453029,\n", " 'q31s': 0.1070757614,\n", " 'q32c': 0.024745592711,\n", " 'q32s': 0.036347573943,\n", " 'q33c': 0.000609416581,\n", " 'q33s': 0.00049392468072,\n", " 'q40': 0.051960788182,\n", " 'q41c': 0.096729266414,\n", " 'q41s': -0.23825631964,\n", " 'q42c': -0.083947282683,\n", " 'q42s': -0.12455375179,\n", " 'q43c': -0.029034203906,\n", " 'q43s': -0.010805163255,\n", " 'q44c': 0.0047750927805,\n", " 'q44s': -0.003890919399,\n", " 'q50': 0.052841082773,\n", " 'q51c': -0.068227481331,\n", " 'q51s': 0.17406623703,\n", " 'q52c': 0.082049793886,\n", " 'q52s': 0.13832399386,\n", " 'q53c': 0.060236017556,\n", " 'q53s': 0.055688530781,\n", " 'q54c': 0.011137162575,\n", " 'q54s': 0.0079359384347,\n", " 'q55c': -0.0097729922349,\n", " 'q55s': 0.010888114243}}}}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_raw_data = points_dir.raw_data\n", "\n", "all_raw_data" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Converting to SQLite3 database\n", "\n", "Reading thousands of files every time is very time consuming (especially on hard drives), so it is much more efficient to read the data once and store it in a database. `ichor` has SQLite3 support implemented, meaning a `PointsDirecotry` can be readily converted to an SQLite3 database. **NOTE: ONLY RAW DATA FROM CALCULATIONS IS STORED IN THE DATABSE. NO POSTPROCESSING IS DONE. ANY POSTPROCESSING MUST BE DONE AT A LATER STEP (e.g. rotating multipole moments).**\n", "\n", "Code snipped to produce database:\n", "\n", "```python\n", "\n", "from ichor.core.files import PointsDirectory\n", "\n", "pd = PointsDirectory(\"points_directory_path\")\n", "pd.write_to_sqlite3_database()\n", "```\n", "\n", "**Note 1: It takes a while to read all files, so this should be submitted on compute.**\n", "\n", "**Note 2: If the dataset is large and split into many `PointsDirectory`-like directories, then you can do**\n", "\n", "```python\n", "from ichor.core.files import PointsDirectory\n", "from pathlib import Path\n", "\n", "parent_dir = Path(\"parent_dir\")\n", "\n", "for d in parent_dir.iterdir():\n", "\n", " pd = PointsDirectory(\"points_directory_path\")\n", " pd.write_to_sqlite3_database(\"large_database.db\")\n", "```\n", "\n", "where all the information will be stored into one database." ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## SQLite Database Schema Diagram\n", "\n", "The following is that the schema diagram looks like for the table currently. The image was made with DBVisualizer. Note that these **all** fields might not be populated if the database. That depends on the raw data that is present in the `PointsDirectory`. For example, if only Gaussian are ran, then the AIMAll-related data will be missing from the database.\n", "\n", "Below is a diagram of the SQLite3 Database, made with DbVisualizer\n", "![alt text](../../../example_files/sql_database_schema.svg \"SQLite3 Schema\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Converting to JSON database\n", "\n", "Very similarly, the `PointsDirectory` instance can be converted to a json database by\n", "\n", "```python\n", "\n", "from ichor.core.files import PointsDirectory\n", "\n", "pd = PointsDirectory(\"points_directory_path\")\n", "pd.write_to_json_database()\n", "```" ] }, { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "## Generating CSV files with Features from SQLite3 Database\n", "\n", "CSV files can be readily made from a `PointsDirectory` instance or a database. CSV files containing (ALF) features and relevant outputs can be generated from an SQLite3 database like so:\n", "\n", "```python\n", "from ichor.core.database.sql.query_database import (\n", " get_alf_from_first_db_geometry,\n", " write_processed_data_for_atoms_parallel,\n", " write_processed_data_for_atoms\n", ")\n", "\n", "db_path = \"DATABASE_PATH\"\n", "\n", "# note that you can also define an ALF manually as well\n", "# or get it from some other molecular geometry\n", "# that contains the same atom sequencing as in the database\n", "alf = get_alf_from_first_db_geometry(db_path)\n", "\n", "# note that this will write files out in parallel\n", "# use write_processed_data_for_atoms for serial\n", "\n", "write_processed_data_for_atoms_parallel(\n", " db_path,\n", " alf,\n", " ncores=4,\n", " calc_multipoles=True, # rotates multipoles using C matrix\n", " calc_forces=False, # calculates ALF forces using Wilson B matrix\n", ")\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "ichor_docs", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }