PointsDirectory - A class used to encapsulate all calculations for many geometries (of a dataset)

The ichor.core.files.PointsDirectory class can be used to easily work with thousands of files which are generated when getting Gaussian, AIMAll, etc. calculations for many geometries.

The general structure of a PointsDirectory-like directory is like so:

.
|--- SYSTEM0001.pointdir
|   |--- SYSTEM0001_atomicfiles
|   |   |--- h2.int
|   |   |--- h3.int
|   |   |--- o1.int
|   |--- SYSTEM0001.gjf
|   |--- SYSTEM0001.wfn
|--- SYSTEM0002.pointdir
|   |--- SYSTEM0002_atomicfiles
|   |   |--- h2.int
|   |   |--- h3.int
|   |   |--- o1.int
|   |--- SYSTEM0002.gjf
|   |--- SYSTEM0002.wfn
...
...
...

Essentially, the PointsDirectory is a classed that is used to parse a directory contains many sub-directories (which are instances of PointDirectory). Each sub-directory (e.g. SYSTEM0001.pointdir, SYSTEM0002.pointdir) contains all relevant calculations for one molecular geometry. Each of the sub-directories can be individually read in as a ichor.core.files.PointDirectory instance (note that there is no s in this case.)

This class makes it easy to access calculations for many geometries very easily.

PointDirectory strucutre

The PointDirectory class encapsulates a directory, containing all relevant calculations for one geometry. It subclasses from ichor.core.files.directory.AnnotatedDirectory. This gives us the ability to define class variables, which are of specific file types. Then the AnnotatedDirectory._parse method is what parses all files in the directory. The extensions of the files determine what the file type, and thus the class which is going to be used to parse the file.

The PointDirectory.contents class variable can be overwritten to quickly add support for new file and directory file types. This ensures that any new file or directory types in ichor are ready to be used with PointDirectory. The contents variable is a Python dictionary containing keys which are going to available as attributes after parsing, and values containing the Python class which is going to parse the relevant file or directory. For example, this is the current contents variable:

contents = {
    "xyz": XYZ,
    "gjf": GJF,
    "gaussian_output": GaussianOutput,
    "orca_input": OrcaInput,
    "orca_output": OrcaOutput,
    "aim": Aim,
    "wfn": WFN,
    "ints": IntDirectory,
}

where XYZ is the class that is going to read a .xyz file in the directory.

Obtaining results from a PointsDirectory

Obtaining total system energy

The following code snippet can be used to quickly get the total system energy from a Gaussian calculation for example

[5]:
from ichor.core.files import PointsDirectory

# PointsDirectory("path_to_directory_with_wfn_and_int_files")
points_dir = PointsDirectory("../../../example_files/example_points_directory/WATER_MONOMER.pointsdir")

for point_directory in points_dir:

    print(point_directory.name, point_directory.wfn.total_energy)
WATER_MONOMER0000.pointdir -76.421710687455
WATER_MONOMER0001.pointdir -76.429947804
WATER_MONOMER0002.pointdir -76.430599107417
WATER_MONOMER0003.pointdir -76.42948849797

Accessing IQA energy for a specific atom

[6]:
for point_directory in points_dir:

    print(point_directory.name, point_directory.ints["O1"].iqa)

# note that this is for A A'
WATER_MONOMER0000.pointdir -75.446714709
WATER_MONOMER0001.pointdir -75.453164031
WATER_MONOMER0002.pointdir -75.453749708
WATER_MONOMER0003.pointdir -75.453284702

Accessing Mulipole Moments

[7]:
for point_directory in points_dir:

    print(point_directory.name, point_directory.ints["O1"].global_spherical_multipoles)

# note these are not rotated
WATER_MONOMER0000.pointdir {'q00': -1.051921199, 'q10': -0.020042356378, 'q11c': 0.0018273275449, 'q11s': -0.20706556929, 'q20': -0.037266216811, 'q21c': -0.79780613831, 'q21s': 0.013146921148, 'q22c': -0.19595266005, 'q22s': 0.078488472227, 'q30': 0.043015515207, 'q31c': -0.053621704828, 'q31s': 0.21644282193, 'q32c': -0.029607236961, 'q32s': -0.89197505111, 'q33c': -0.053969314597, 'q33s': 0.16211677693, 'q40': -1.4545843935, 'q41c': 0.91783517331, 'q41s': 0.17650015949, 'q42c': -0.73112185714, 'q42s': -0.3293114897, 'q43c': 2.8344280941, 'q43s': -0.16267842746, 'q44c': -1.3853362266, 'q44s': 0.089771195512, 'q50': -0.24411738335, 'q51c': 0.48960856702, 'q51s': -1.5472642317, 'q52c': -0.040094542612, 'q52s': 0.98097072569, 'q53c': 0.72718022845, 'q53s': -1.1988409017, 'q54c': -0.47766441277, 'q54s': 2.0753064137, 'q55c': -0.29405113415, 'q55s': -1.6430303594}
WATER_MONOMER0001.pointdir {'q00': -1.1248310833, 'q10': -0.15773618224, 'q11c': 0.081543820356, 'q11s': 0.12130191092, 'q20': -0.12606180318, 'q21c': 0.19139555969, 'q21s': -0.528487624, 'q22c': 0.54627966503, 'q22s': -0.078510757285, 'q30': -0.27889785637, 'q31c': 0.49213687845, 'q31s': -0.3074017347, 'q32c': 0.14764509944, 'q32s': -0.24727703379, 'q33c': -0.44944367966, 'q33s': -0.35435058603, 'q40': -0.48581728297, 'q41c': 1.5494543387, 'q41s': -0.23351555403, 'q42c': -0.86875233643, 'q42s': -1.853314432, 'q43c': -1.8063689406, 'q43s': -1.1998626773, 'q44c': 0.48732628951, 'q44s': 1.3828550511, 'q50': 0.4495412122, 'q51c': 0.31877779684, 'q51s': -0.024332972891, 'q52c': -1.476162947, 'q52s': -2.119742167, 'q53c': -1.5055751323, 'q53s': 0.18416072712, 'q54c': 0.45591672771, 'q54s': 2.1945711652, 'q55c': 0.03684630334, 'q55s': -1.3214124663}
WATER_MONOMER0002.pointdir {'q00': -1.1395301182, 'q10': 0.1170316343, 'q11c': 0.12707113118, 'q11s': 0.14319062627, 'q20': 0.035370307734, 'q21c': -0.42415679932, 'q21s': 0.49791338765, 'q22c': 0.37603023988, 'q22s': 0.20445186908, 'q30': -0.15737553444, 'q31c': 0.45818443084, 'q31s': -0.35883958802, 'q32c': 0.3480880684, 'q32s': 0.20971032944, 'q33c': -0.29423125466, 'q33s': -0.34625176326, 'q40': 0.41096573978, 'q41c': 0.09307943713, 'q41s': 1.5410113059, 'q42c': -2.179742611, 'q42s': -1.5590824961, 'q43c': 0.35883654798, 'q43s': -0.88871603233, 'q44c': 0.43934126428, 'q44s': 1.7461938373, 'q50': 1.0996389779, 'q51c': -1.262699744, 'q51s': -1.3852909494, 'q52c': 1.560792331, 'q52s': 0.24419844412, 'q53c': -0.37471724152, 'q53s': 2.5132383079, 'q54c': -0.95964885508, 'q54s': -0.26895872429, 'q55c': 0.55394494551, 'q55s': -1.8680975711}
WATER_MONOMER0003.pointdir {'q00': -1.1261722464, 'q10': 0.11200768361, 'q11c': 0.19713726623, 'q11s': -0.046368083336, 'q20': 0.25960435715, 'q21c': -0.159889789, 'q21s': -0.51517986412, 'q22c': 0.49258123822, 'q22s': 0.087179927499, 'q30': -0.4214583255, 'q31c': 0.024210442018, 'q31s': 0.44866570921, 'q32c': 0.31042203091, 'q32s': 0.23745156258, 'q33c': -0.37812439877, 'q33s': -0.13095025897, 'q40': 0.88553589917, 'q41c': 1.9512466748, 'q41s': -1.4646293031, 'q42c': -1.3890087525, 'q42s': -1.1724735472, 'q43c': -1.3146772524, 'q43s': -0.23121743344, 'q44c': 1.278074295, 'q44s': 0.73371421822, 'q50': 1.1218634504, 'q51c': -2.2221116073, 'q51s': 1.0854260589, 'q52c': -0.70009846634, 'q52s': 1.5789483953, 'q53c': 2.0156858127, 'q53s': 0.35708799616, 'q54c': 1.2709617017, 'q54s': -0.46441127444, 'q55c': -1.2724101051, 'q55s': -0.93923967921}

Accessing all data from all files

There is a very quick way to obtain all raw data from all calculations in a PointsDirectory. The raw_data property can be used to obtain the raw data. This returns a Python dictionary where the keys are the point names and the values are a nested Python dictionary containing the results from all the relevant calculations.

[8]:
all_raw_data = points_dir.raw_data

all_raw_data
[8]:
{'WATER_MONOMER0000': {'gaussian_output': {'global_forces': {'O1': array([ 0.02953315,  0.0827204 , -0.02495305]),
    'H2': array([ 0.00578961, -0.0242831 , -0.00842433]),
    'H3': array([-0.03532276, -0.05843731,  0.03337739])},
   'charge': 0,
   'multiplicity': 1,
   'molecular_dipole': MolecularDipole(x=0.1189, y=2.3866, z=0.0787),
   'molecular_quadrupole': MolecularQuadrupole(xx=-6.5273, yy=-7.7674, zz=-6.2577, xy=0.0665, xz=-1.6318, yz=-0.0495),
   'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=0.3235, yy=-0.9166, zz=0.5931, xy=0.0665, xz=-1.6318, yz=-0.0495),
   'molecular_octupole': MolecularOctupole(xxx=0.5348, yyy=8.5805, zzz=0.2229, xyy=0.1794, xxy=3.083, xxz=0.0727, xzz=0.1817, yzz=3.0764, yyz=0.0143, xyz=0.0059),
   'molecular_hexadecapole': MolecularHexadecapole(xxxx=-8.283, yyyy=-15.2596, zzzz=-8.1733, xxxy=-0.2961, xxxz=-0.1375, yyyx=-0.296, yyyz=0.056, zzzx=-0.1975, zzzy=0.0334, xxyy=-3.888, xxzz=-2.4818, yyzz=-3.8474, xxyz=-0.0404, yyxz=-0.2009, zzxy=-0.0859)},
  'wfn': {'energy': -76.421710687455, 'virial_ratio': 2.01177209},
  'ints': {'H2': {'iqa': -0.48880091691,
    'integration_error': 1.8741005824e-05,
    'q00': 0.55107276527,
    'q10': -0.10046776424,
    'q11c': 0.082404094273,
    'q11s': -0.12293368007,
    'q20': 0.0036460292977,
    'q21c': 0.00029619225829,
    'q21s': -0.0074877731368,
    'q22c': 0.0074481701736,
    'q22s': 0.0056874730269,
    'q30': -0.05956450163,
    'q31c': -0.039733101597,
    'q31s': 0.041495598451,
    'q32c': -0.028165432797,
    'q32s': -0.11949807302,
    'q33c': 0.058162645861,
    'q33s': 0.018335018802,
    'q40': -0.12002978326,
    'q41c': 0.040444633273,
    'q41s': -0.069505346729,
    'q42c': -0.019427525486,
    'q42s': -0.15727748843,
    'q43c': 0.1932877549,
    'q43s': 0.062934026281,
    'q44c': -0.057209327496,
    'q44s': 0.059969789986,
    'q50': -0.011914817313,
    'q51c': 0.062065103525,
    'q51s': -0.076563819972,
    'q52c': 0.048593580786,
    'q52s': 0.026776257085,
    'q53c': 0.072780829944,
    'q53s': 0.028736420893,
    'q54c': -0.04279361242,
    'q54s': 0.10697549721,
    'q55c': -0.031070190556,
    'q55s': -0.026095070495},
   'O1': {'iqa': -75.446714709,
    'integration_error': -2.8725236559e-05,
    'q00': -1.051921199,
    'q10': -0.020042356378,
    'q11c': 0.0018273275449,
    'q11s': -0.20706556929,
    'q20': -0.037266216811,
    'q21c': -0.79780613831,
    'q21s': 0.013146921148,
    'q22c': -0.19595266005,
    'q22s': 0.078488472227,
    'q30': 0.043015515207,
    'q31c': -0.053621704828,
    'q31s': 0.21644282193,
    'q32c': -0.029607236961,
    'q32s': -0.89197505111,
    'q33c': -0.053969314597,
    'q33s': 0.16211677693,
    'q40': -1.4545843935,
    'q41c': 0.91783517331,
    'q41s': 0.17650015949,
    'q42c': -0.73112185714,
    'q42s': -0.3293114897,
    'q43c': 2.8344280941,
    'q43s': -0.16267842746,
    'q44c': -1.3853362266,
    'q44s': 0.089771195512,
    'q50': -0.24411738335,
    'q51c': 0.48960856702,
    'q51s': -1.5472642317,
    'q52c': -0.040094542612,
    'q52s': 0.98097072569,
    'q53c': 0.72718022845,
    'q53s': -1.1988409017,
    'q54c': -0.47766441277,
    'q54s': 2.0753064137,
    'q55c': -0.29405113415,
    'q55s': -1.6430303594},
   'H3': {'iqa': -0.48619221042,
    'integration_error': 1.7274446651e-05,
    'q00': 0.50085300856,
    'q10': 0.085197944712,
    'q11c': -0.087841616442,
    'q11s': -0.12029023455,
    'q20': 0.00023993899792,
    'q21c': -0.023661428009,
    'q21s': -0.021391961736,
    'q22c': -1.9131859436e-05,
    'q22s': 0.02151432195,
    'q30': 0.087599091989,
    'q31c': 0.031324229347,
    'q31s': 0.026427482381,
    'q32c': 0.023804975755,
    'q32s': -0.15674955727,
    'q33c': -0.088059760394,
    'q33s': 0.037288453803,
    'q40': -0.07444947707,
    'q41c': 0.05887992764,
    'q41s': 0.081114651822,
    'q42c': 0.0088380735081,
    'q42s': 0.083371237428,
    'q43c': 0.15535798009,
    'q43s': -0.069909915593,
    'q44c': -0.064026374739,
    'q44s': -0.057177316644,
    'q50': 0.0082484388953,
    'q51c': 0.099441485367,
    'q51s': 0.11360202978,
    'q52c': -0.01832368283,
    'q52s': -0.0010932889664,
    'q53c': 0.15341606275,
    'q53s': -0.10054868004,
    'q54c': -0.17440062253,
    'q54s': -0.0001046247894,
    'q55c': 0.045251112263,
    'q55s': 0.071037518757}}},
 'WATER_MONOMER0001': {'gaussian_output': {'global_forces': {'O1': array([ 0.03848438, -0.02380376,  0.03412189]),
    'H2': array([-0.03146119,  0.00036476, -0.00231774]),
    'H3': array([-0.00702319,  0.02343899, -0.03180415])},
   'charge': 0,
   'multiplicity': 1,
   'molecular_dipole': MolecularDipole(x=-0.8431, y=-1.323, z=1.7231),
   'molecular_quadrupole': MolecularQuadrupole(xx=-5.1893, yy=-7.688, zz=-7.426, xy=-0.5634, xz=0.9119, yz=-0.3113),
   'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=1.5784, yy=-0.9202, zz=-0.6582, xy=-0.5634, xz=0.9119, yz=-0.3113),
   'molecular_octupole': MolecularOctupole(xxx=-2.948, yyy=-4.821, zzz=6.188, xyy=-0.9565, xxy=-1.5601, xxz=2.0371, xzz=-0.9166, yzz=-1.4649, yyz=2.004, xyz=-0.0833),
   'molecular_hexadecapole': MolecularHexadecapole(xxxx=-7.6094, yyyy=-10.1144, zzzz=-11.4551, xxxy=-0.5969, xxxz=0.8455, yyyx=-0.6288, yyyz=1.3354, zzzx=0.9015, zzzy=1.1908, xxyy=-3.0317, xxzz=-3.1805, yyzz=-3.4563, xxyz=0.2984, yyxz=0.3128, zzxy=-0.2858)},
  'wfn': {'energy': -76.429947804, 'virial_ratio': 2.00949207},
  'ints': {'H2': {'iqa': -0.48887410664,
    'integration_error': 9.5429453148e-06,
    'q00': 0.57930929588,
    'q10': -0.046102497364,
    'q11c': 0.16558301986,
    'q11s': 0.041594915198,
    'q20': 0.010377317695,
    'q21c': 0.018136684148,
    'q21s': 0.006500049871,
    'q22c': -0.02013508558,
    'q22s': -0.015523513189,
    'q30': -0.034744645637,
    'q31c': 0.057390530241,
    'q31s': 0.0065667570896,
    'q32c': 0.039512071665,
    'q32s': 0.020328858681,
    'q33c': -0.076627866971,
    'q33s': -0.050739679414,
    'q40': 0.050244296949,
    'q41c': 0.13100319212,
    'q41s': 0.030144145474,
    'q42c': -0.10131293908,
    'q42s': -0.02287002106,
    'q43c': -0.089114870694,
    'q43s': -0.097196124547,
    'q44c': 0.13556820311,
    'q44s': 0.15270252132,
    'q50': 0.099377074131,
    'q51c': -0.0078247554538,
    'q51s': 0.014794214546,
    'q52c': -0.14821510785,
    'q52s': -0.089844814373,
    'q53c': 0.085049483662,
    'q53s': 0.0066340841318,
    'q54c': 0.055004363647,
    'q54s': 0.19474995219,
    'q55c': -0.090696179084,
    'q55s': -0.19386856821},
   'O1': {'iqa': -75.453164031,
    'integration_error': -3.2412635167e-05,
    'q00': -1.1248310833,
    'q10': -0.15773618224,
    'q11c': 0.081543820356,
    'q11s': 0.12130191092,
    'q20': -0.12606180318,
    'q21c': 0.19139555969,
    'q21s': -0.528487624,
    'q22c': 0.54627966503,
    'q22s': -0.078510757285,
    'q30': -0.27889785637,
    'q31c': 0.49213687845,
    'q31s': -0.3074017347,
    'q32c': 0.14764509944,
    'q32s': -0.24727703379,
    'q33c': -0.44944367966,
    'q33s': -0.35435058603,
    'q40': -0.48581728297,
    'q41c': 1.5494543387,
    'q41s': -0.23351555403,
    'q42c': -0.86875233643,
    'q42s': -1.853314432,
    'q43c': -1.8063689406,
    'q43s': -1.1998626773,
    'q44c': 0.48732628951,
    'q44s': 1.3828550511,
    'q50': 0.4495412122,
    'q51c': 0.31877779684,
    'q51s': -0.024332972891,
    'q52c': -1.476162947,
    'q52s': -2.119742167,
    'q53c': -1.5055751323,
    'q53s': 0.18416072712,
    'q54c': 0.45591672771,
    'q54s': 2.1945711652,
    'q55c': 0.03684630334,
    'q55s': -1.3214124663},
   'H3': {'iqa': -0.48791314597,
    'integration_error': 1.767282812e-05,
    'q00': 0.54551717948,
    'q10': -0.12912268454,
    'q11c': -0.073220513388,
    'q11s': 0.093216336946,
    'q20': -0.0041810639947,
    'q21c': 0.0054837257387,
    'q21s': 0.0035806223466,
    'q22c': 0.0055876412653,
    'q22s': -0.0037182632383,
    'q30': -0.031536097089,
    'q31c': 0.074389974368,
    'q31s': -0.088388188785,
    'q32c': -0.0070097113295,
    'q32s': -0.1115211214,
    'q33c': -0.033297232489,
    'q33s': -0.032924675413,
    'q40': -0.13030060961,
    'q41c': 0.038148510124,
    'q41s': -0.073690881766,
    'q42c': -0.049562233205,
    'q42s': -0.19895480526,
    'q43c': -0.10771772308,
    'q43s': -0.084626662835,
    'q44c': -0.043351191843,
    'q44s': 0.0023716485062,
    'q50': -0.053242517161,
    'q51c': 0.0053036087089,
    'q51s': -0.008766321614,
    'q52c': -0.098989030278,
    'q52s': -0.043646219093,
    'q53c': -0.058542351859,
    'q53s': 0.030550952424,
    'q54c': 0.01341571848,
    'q54s': 0.014472605634,
    'q55c': 0.020911770864,
    'q55s': -0.019132623387}}},
 'WATER_MONOMER0002': {'gaussian_output': {'global_forces': {'O1': array([-0.01182343, -0.00326165,  0.00243345]),
    'H2': array([-0.00938744,  0.00801946,  0.01598048]),
    'H3': array([ 0.02121088, -0.00475781, -0.01841394])},
   'charge': 0,
   'multiplicity': 1,
   'molecular_dipole': MolecularDipole(x=-1.3324, y=-1.5029, z=-1.2291),
   'molecular_quadrupole': MolecularQuadrupole(xx=-6.0559, yy=-7.6792, zz=-6.5966, xy=-0.2517, xz=-1.3516, yz=0.355),
   'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=0.7214, yy=-0.9019, zz=0.1806, xy=-0.2517, xz=-1.3516, yz=0.355),
   'molecular_octupole': MolecularOctupole(xxx=-4.7196, yyy=-5.3872, zzz=-4.3733, xyy=-1.5173, xxy=-1.7003, xxz=-1.4362, xzz=-1.5791, yzz=-1.6989, yyz=-1.3608, xyz=0.0836),
   'molecular_hexadecapole': MolecularHexadecapole(xxxx=-9.1389, yyyy=-10.4563, zzzz=-9.1524, xxxy=-0.9871, xxxz=-1.2163, yyyx=-1.1784, yyyz=-0.9508, zzzx=-1.1644, zzzy=-0.7955, xxyy=-3.2977, xxzz=-3.0954, yyzz=-3.1475, xxyz=-0.3143, yyxz=-0.3329, zzxy=-0.4511)},
  'wfn': {'energy': -76.430599107417, 'virial_ratio': 2.00850663},
  'ints': {'H2': {'iqa': -0.4884303933,
    'integration_error': 8.4456970171e-06,
    'q00': 0.56704064421,
    'q10': -0.016788810577,
    'q11c': 0.16678532637,
    'q11s': 0.05926427077,
    'q20': 0.012378760753,
    'q21c': -0.0011590886116,
    'q21s': 0.0002814248579,
    'q22c': -0.013873040688,
    'q22s': -0.015668008236,
    'q30': -0.026844098476,
    'q31c': 0.070432957219,
    'q31s': 0.018973897274,
    'q32c': 0.035650388172,
    'q32s': 0.015621862978,
    'q33c': -0.062642395464,
    'q33s': -0.086278367162,
    'q40': 0.087362375529,
    'q41c': 0.076220558505,
    'q41s': 0.01615660549,
    'q42c': -0.11754244046,
    'q42s': -0.086770182195,
    'q43c': -0.070653482872,
    'q43s': -0.0463683192,
    'q44c': 0.053438266326,
    'q44s': 0.21580446061,
    'q50': 0.026789945801,
    'q51c': -0.086115785343,
    'q51s': -0.037958011591,
    'q52c': -0.039664935648,
    'q52s': -0.013453543961,
    'q53c': 0.053511966654,
    'q53s': 0.13667812966,
    'q54c': 0.074049196913,
    'q54s': 0.03171448802,
    'q55c': 0.053650772738,
    'q55s': -0.19420111212},
   'O1': {'iqa': -75.453749708,
    'integration_error': -3.8823037245e-05,
    'q00': -1.1395301182,
    'q10': 0.1170316343,
    'q11c': 0.12707113118,
    'q11s': 0.14319062627,
    'q20': 0.035370307734,
    'q21c': -0.42415679932,
    'q21s': 0.49791338765,
    'q22c': 0.37603023988,
    'q22s': 0.20445186908,
    'q30': -0.15737553444,
    'q31c': 0.45818443084,
    'q31s': -0.35883958802,
    'q32c': 0.3480880684,
    'q32s': 0.20971032944,
    'q33c': -0.29423125466,
    'q33s': -0.34625176326,
    'q40': 0.41096573978,
    'q41c': 0.09307943713,
    'q41s': 1.5410113059,
    'q42c': -2.179742611,
    'q42s': -1.5590824961,
    'q43c': 0.35883654798,
    'q43s': -0.88871603233,
    'q44c': 0.43934126428,
    'q44s': 1.7461938373,
    'q50': 1.0996389779,
    'q51c': -1.262699744,
    'q51s': -1.3852909494,
    'q52c': 1.560792331,
    'q52s': 0.24419844412,
    'q53c': -0.37471724152,
    'q53s': 2.5132383079,
    'q54c': -0.95964885508,
    'q54s': -0.26895872429,
    'q55c': 0.55394494551,
    'q55s': -1.8680975711},
   'H3': {'iqa': -0.48842559681,
    'integration_error': 1.7432947248e-05,
    'q00': 0.5724835067,
    'q10': 0.1457603332,
    'q11c': -0.028104713416,
    'q11s': 0.09795249107,
    'q20': -0.014236447809,
    'q21c': 0.0020513538853,
    'q21s': -0.024002553682,
    'q22c': 0.0090167292986,
    'q22s': 0.0018592788488,
    'q30': -0.016278101055,
    'q31c': 0.043378639139,
    'q31s': -0.10104168631,
    'q32c': 0.047377463605,
    'q32s': 0.042668161752,
    'q33c': -0.014828920829,
    'q33s': 0.002301899411,
    'q40': -0.070410778984,
    'q41c': -0.074472064823,
    'q41s': 0.18279123322,
    'q42c': -0.1612131176,
    'q42s': -0.12499918589,
    'q43c': 0.06142026173,
    'q43s': -0.037102021898,
    'q44c': -0.0036973964853,
    'q44s': 0.015807001231,
    'q50': 0.16521543103,
    'q51c': 0.025818622551,
    'q51s': -0.070934786473,
    'q52c': 0.20406825774,
    'q52s': 0.12904411072,
    'q53c': -0.064053836207,
    'q53s': 0.092995258457,
    'q54c': -0.019812495159,
    'q54s': -0.014274832112,
    'q55c': 0.013994317071,
    'q55s': -0.0022039694548}}},
 'WATER_MONOMER0003': {'gaussian_output': {'global_forces': {'O1': array([-0.01819327,  0.00455452, -0.01096517]),
    'H2': array([-0.00188946, -0.00938049,  0.02132555]),
    'H3': array([ 0.02008273,  0.00482597, -0.01036038])},
   'charge': 0,
   'multiplicity': 1,
   'molecular_dipole': MolecularDipole(x=-2.0363, y=0.479, z=-1.157),
   'molecular_quadrupole': MolecularQuadrupole(xx=-6.8842, yy=-7.4952, zz=-6.0587, xy=0.4805, xz=-1.0644, yz=-0.8024),
   'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=-0.0715, yy=-0.6825, zz=0.754, xy=0.4805, xz=-1.0644, yz=-0.8024),
   'molecular_octupole': MolecularOctupole(xxx=-7.1177, yyy=1.7805, zzz=-4.1497, xyy=-2.516, xxy=0.5301, xxz=-1.2734, xzz=-2.3914, yzz=0.5188, yyz=-1.4113, xyz=-0.0642),
   'molecular_hexadecapole': MolecularHexadecapole(xxxx=-12.0314, yyyy=-8.1287, zzzz=-8.7818, xxxy=0.6728, xxxz=-1.5822, yyyx=0.6119, yyyz=0.2975, zzzx=-1.4365, zzzy=-0.0026, xxyy=-3.5051, xxzz=-3.5441, yyzz=-2.8023, xxyz=0.0089, yyxz=-0.4888, zzxy=0.1872)},
  'wfn': {'energy': -76.42948849797, 'virial_ratio': 2.00883314},
  'ints': {'H2': {'iqa': -0.4881027439,
    'integration_error': 9.4522550832e-06,
    'q00': 0.56326128376,
    'q10': -0.038053718191,
    'q11c': 0.17406452668,
    'q11s': 0.019124181097,
    'q20': 0.012081611559,
    'q21c': 0.0051369484026,
    'q21s': -0.00074088786746,
    'q22c': -0.019407551417,
    'q22s': -0.0027908684718,
    'q30': -0.046075383748,
    'q31c': 0.050014857932,
    'q31s': 0.0094560487176,
    'q32c': 0.058475130664,
    'q32s': 0.024077433258,
    'q33c': -0.097026056963,
    'q33s': -0.03579136459,
    'q40': 0.04154079856,
    'q41c': 0.13187846251,
    'q41s': 0.021412542719,
    'q42c': -0.084391243749,
    'q42s': -0.0200831628,
    'q43c': -0.11848751779,
    'q43s': -0.074415491028,
    'q44c': 0.19465272929,
    'q44s': 0.088507549312,
    'q50': 0.058580091185,
    'q51c': -0.042565949802,
    'q51s': -0.0043775245542,
    'q52c': -0.089474143135,
    'q52s': -0.010408877679,
    'q53c': 0.079998556562,
    'q53s': 0.0045329002427,
    'q54c': 0.07873109218,
    'q54s': 0.091471346749,
    'q55c': -0.17860562422,
    'q55s': -0.077365814166},
   'O1': {'iqa': -75.453284702,
    'integration_error': -2.3962230701e-05,
    'q00': -1.1261722464,
    'q10': 0.11200768361,
    'q11c': 0.19713726623,
    'q11s': -0.046368083336,
    'q20': 0.25960435715,
    'q21c': -0.159889789,
    'q21s': -0.51517986412,
    'q22c': 0.49258123822,
    'q22s': 0.087179927499,
    'q30': -0.4214583255,
    'q31c': 0.024210442018,
    'q31s': 0.44866570921,
    'q32c': 0.31042203091,
    'q32s': 0.23745156258,
    'q33c': -0.37812439877,
    'q33s': -0.13095025897,
    'q40': 0.88553589917,
    'q41c': 1.9512466748,
    'q41s': -1.4646293031,
    'q42c': -1.3890087525,
    'q42s': -1.1724735472,
    'q43c': -1.3146772524,
    'q43s': -0.23121743344,
    'q44c': 1.278074295,
    'q44s': 0.73371421822,
    'q50': 1.1218634504,
    'q51c': -2.2221116073,
    'q51s': 1.0854260589,
    'q52c': -0.70009846634,
    'q52s': 1.5789483953,
    'q53c': 2.0156858127,
    'q53s': 0.35708799616,
    'q54c': 1.2709617017,
    'q54s': -0.46441127444,
    'q55c': -1.2724101051,
    'q55s': -0.93923967921},
   'H3': {'iqa': -0.4880976683,
    'integration_error': 1.85987186e-05,
    'q00': 0.56290675093,
    'q10': 0.16001384105,
    'q11c': 0.04069948876,
    'q11s': -0.069618583234,
    'q20': -0.014716885498,
    'q21c': -0.011767701802,
    'q21s': 0.012518520553,
    'q22c': 0.0019972173909,
    'q22s': 0.0051377000688,
    'q30': -0.065290261963,
    'q31c': -0.043467453029,
    'q31s': 0.1070757614,
    'q32c': 0.024745592711,
    'q32s': 0.036347573943,
    'q33c': 0.000609416581,
    'q33s': 0.00049392468072,
    'q40': 0.051960788182,
    'q41c': 0.096729266414,
    'q41s': -0.23825631964,
    'q42c': -0.083947282683,
    'q42s': -0.12455375179,
    'q43c': -0.029034203906,
    'q43s': -0.010805163255,
    'q44c': 0.0047750927805,
    'q44s': -0.003890919399,
    'q50': 0.052841082773,
    'q51c': -0.068227481331,
    'q51s': 0.17406623703,
    'q52c': 0.082049793886,
    'q52s': 0.13832399386,
    'q53c': 0.060236017556,
    'q53s': 0.055688530781,
    'q54c': 0.011137162575,
    'q54s': 0.0079359384347,
    'q55c': -0.0097729922349,
    'q55s': 0.010888114243}}}}

Converting to SQLite3 database

Reading thousands of files every time is very time consuming (especially on hard drives), so it is much more efficient to read the data once and store it in a database. ichor has SQLite3 support implemented, meaning a PointsDirecotry can be readily converted to an SQLite3 database. NOTE: ONLY RAW DATA FROM CALCULATIONS IS STORED IN THE DATABSE. NO POSTPROCESSING IS DONE. ANY POSTPROCESSING MUST BE DONE AT A LATER STEP (e.g. rotating multipole moments).

Code snipped to produce database:

from ichor.core.files import PointsDirectory

pd = PointsDirectory("points_directory_path")
pd.write_to_sqlite3_database()

Note 1: It takes a while to read all files, so this should be submitted on compute.

Note 2: If the dataset is large and split into many ``PointsDirectory``-like directories, then you can do

from ichor.core.files import PointsDirectory
from pathlib import Path

parent_dir = Path("parent_dir")

for d in parent_dir.iterdir():

    pd = PointsDirectory("points_directory_path")
    pd.write_to_sqlite3_database("large_database.db")

where all the information will be stored into one database.

SQLite Database Schema Diagram

The following is that the schema diagram looks like for the table currently. The image was made with DBVisualizer. Note that these all fields might not be populated if the database. That depends on the raw data that is present in the PointsDirectory. For example, if only Gaussian are ran, then the AIMAll-related data will be missing from the database.

Below is a diagram of the SQLite3 Database, made with DbVisualizer alt text

Converting to JSON database

Very similarly, the PointsDirectory instance can be converted to a json database by

from ichor.core.files import PointsDirectory

pd = PointsDirectory("points_directory_path")
pd.write_to_json_database()

Generating CSV files with Features from SQLite3 Database

CSV files can be readily made from a PointsDirectory instance or a database. CSV files containing (ALF) features and relevant outputs can be generated from an SQLite3 database like so:

from ichor.core.database.sql.query_database import (
    get_alf_from_first_db_geometry,
    write_processed_data_for_atoms_parallel,
    write_processed_data_for_atoms
)

db_path = "DATABASE_PATH"

# note that you can also define an ALF manually as well
# or get it from some other molecular geometry
# that contains the same atom sequencing as in the database
alf = get_alf_from_first_db_geometry(db_path)

# note that this will write files out in parallel
# use write_processed_data_for_atoms for serial

write_processed_data_for_atoms_parallel(
    db_path,
    alf,
    ncores=4,
    calc_multipoles=True, # rotates multipoles using C matrix
    calc_forces=False, # calculates ALF forces using Wilson B matrix
)