PointsDirectory - A class used to encapsulate all calculations for many geometries (of a dataset)
The ichor.core.files.PointsDirectory class can be used to easily work with thousands of files which are generated when getting Gaussian, AIMAll, etc. calculations for many geometries.
The general structure of a PointsDirectory-like directory is like so:
.
|--- SYSTEM0001.pointdir
| |--- SYSTEM0001_atomicfiles
| | |--- h2.int
| | |--- h3.int
| | |--- o1.int
| |--- SYSTEM0001.gjf
| |--- SYSTEM0001.wfn
|--- SYSTEM0002.pointdir
| |--- SYSTEM0002_atomicfiles
| | |--- h2.int
| | |--- h3.int
| | |--- o1.int
| |--- SYSTEM0002.gjf
| |--- SYSTEM0002.wfn
...
...
...
Essentially, the PointsDirectory is a classed that is used to parse a directory contains many sub-directories (which are instances of PointDirectory). Each sub-directory (e.g. SYSTEM0001.pointdir, SYSTEM0002.pointdir) contains all relevant calculations for one molecular geometry. Each of the sub-directories can be individually read in as a ichor.core.files.PointDirectory instance (note that there is no s in this case.)
This class makes it easy to access calculations for many geometries very easily.
PointDirectory strucutre
The PointDirectory class encapsulates a directory, containing all relevant calculations for one geometry. It subclasses from ichor.core.files.directory.AnnotatedDirectory. This gives us the ability to define class variables, which are of specific file types. Then the AnnotatedDirectory._parse method is what parses all files in the directory. The extensions of the files determine what the file type, and thus the class which is going to be used to parse the file.
The PointDirectory.contents class variable can be overwritten to quickly add support for new file and directory file types. This ensures that any new file or directory types in ichor are ready to be used with PointDirectory. The contents variable is a Python dictionary containing keys which are going to available as attributes after parsing, and values containing the Python class which is going to parse the relevant file or directory. For example, this is the current contents
variable:
contents = {
"xyz": XYZ,
"gjf": GJF,
"gaussian_output": GaussianOutput,
"orca_input": OrcaInput,
"orca_output": OrcaOutput,
"aim": Aim,
"wfn": WFN,
"ints": IntDirectory,
}
where XYZ is the class that is going to read a .xyz file in the directory.
Obtaining results from a PointsDirectory
Obtaining total system energy
The following code snippet can be used to quickly get the total system energy from a Gaussian calculation for example
[5]:
from ichor.core.files import PointsDirectory
# PointsDirectory("path_to_directory_with_wfn_and_int_files")
points_dir = PointsDirectory("../../../example_files/example_points_directory/WATER_MONOMER.pointsdir")
for point_directory in points_dir:
print(point_directory.name, point_directory.wfn.total_energy)
WATER_MONOMER0000.pointdir -76.421710687455
WATER_MONOMER0001.pointdir -76.429947804
WATER_MONOMER0002.pointdir -76.430599107417
WATER_MONOMER0003.pointdir -76.42948849797
Accessing IQA energy for a specific atom
[6]:
for point_directory in points_dir:
print(point_directory.name, point_directory.ints["O1"].iqa)
# note that this is for A A'
WATER_MONOMER0000.pointdir -75.446714709
WATER_MONOMER0001.pointdir -75.453164031
WATER_MONOMER0002.pointdir -75.453749708
WATER_MONOMER0003.pointdir -75.453284702
Accessing Mulipole Moments
[7]:
for point_directory in points_dir:
print(point_directory.name, point_directory.ints["O1"].global_spherical_multipoles)
# note these are not rotated
WATER_MONOMER0000.pointdir {'q00': -1.051921199, 'q10': -0.020042356378, 'q11c': 0.0018273275449, 'q11s': -0.20706556929, 'q20': -0.037266216811, 'q21c': -0.79780613831, 'q21s': 0.013146921148, 'q22c': -0.19595266005, 'q22s': 0.078488472227, 'q30': 0.043015515207, 'q31c': -0.053621704828, 'q31s': 0.21644282193, 'q32c': -0.029607236961, 'q32s': -0.89197505111, 'q33c': -0.053969314597, 'q33s': 0.16211677693, 'q40': -1.4545843935, 'q41c': 0.91783517331, 'q41s': 0.17650015949, 'q42c': -0.73112185714, 'q42s': -0.3293114897, 'q43c': 2.8344280941, 'q43s': -0.16267842746, 'q44c': -1.3853362266, 'q44s': 0.089771195512, 'q50': -0.24411738335, 'q51c': 0.48960856702, 'q51s': -1.5472642317, 'q52c': -0.040094542612, 'q52s': 0.98097072569, 'q53c': 0.72718022845, 'q53s': -1.1988409017, 'q54c': -0.47766441277, 'q54s': 2.0753064137, 'q55c': -0.29405113415, 'q55s': -1.6430303594}
WATER_MONOMER0001.pointdir {'q00': -1.1248310833, 'q10': -0.15773618224, 'q11c': 0.081543820356, 'q11s': 0.12130191092, 'q20': -0.12606180318, 'q21c': 0.19139555969, 'q21s': -0.528487624, 'q22c': 0.54627966503, 'q22s': -0.078510757285, 'q30': -0.27889785637, 'q31c': 0.49213687845, 'q31s': -0.3074017347, 'q32c': 0.14764509944, 'q32s': -0.24727703379, 'q33c': -0.44944367966, 'q33s': -0.35435058603, 'q40': -0.48581728297, 'q41c': 1.5494543387, 'q41s': -0.23351555403, 'q42c': -0.86875233643, 'q42s': -1.853314432, 'q43c': -1.8063689406, 'q43s': -1.1998626773, 'q44c': 0.48732628951, 'q44s': 1.3828550511, 'q50': 0.4495412122, 'q51c': 0.31877779684, 'q51s': -0.024332972891, 'q52c': -1.476162947, 'q52s': -2.119742167, 'q53c': -1.5055751323, 'q53s': 0.18416072712, 'q54c': 0.45591672771, 'q54s': 2.1945711652, 'q55c': 0.03684630334, 'q55s': -1.3214124663}
WATER_MONOMER0002.pointdir {'q00': -1.1395301182, 'q10': 0.1170316343, 'q11c': 0.12707113118, 'q11s': 0.14319062627, 'q20': 0.035370307734, 'q21c': -0.42415679932, 'q21s': 0.49791338765, 'q22c': 0.37603023988, 'q22s': 0.20445186908, 'q30': -0.15737553444, 'q31c': 0.45818443084, 'q31s': -0.35883958802, 'q32c': 0.3480880684, 'q32s': 0.20971032944, 'q33c': -0.29423125466, 'q33s': -0.34625176326, 'q40': 0.41096573978, 'q41c': 0.09307943713, 'q41s': 1.5410113059, 'q42c': -2.179742611, 'q42s': -1.5590824961, 'q43c': 0.35883654798, 'q43s': -0.88871603233, 'q44c': 0.43934126428, 'q44s': 1.7461938373, 'q50': 1.0996389779, 'q51c': -1.262699744, 'q51s': -1.3852909494, 'q52c': 1.560792331, 'q52s': 0.24419844412, 'q53c': -0.37471724152, 'q53s': 2.5132383079, 'q54c': -0.95964885508, 'q54s': -0.26895872429, 'q55c': 0.55394494551, 'q55s': -1.8680975711}
WATER_MONOMER0003.pointdir {'q00': -1.1261722464, 'q10': 0.11200768361, 'q11c': 0.19713726623, 'q11s': -0.046368083336, 'q20': 0.25960435715, 'q21c': -0.159889789, 'q21s': -0.51517986412, 'q22c': 0.49258123822, 'q22s': 0.087179927499, 'q30': -0.4214583255, 'q31c': 0.024210442018, 'q31s': 0.44866570921, 'q32c': 0.31042203091, 'q32s': 0.23745156258, 'q33c': -0.37812439877, 'q33s': -0.13095025897, 'q40': 0.88553589917, 'q41c': 1.9512466748, 'q41s': -1.4646293031, 'q42c': -1.3890087525, 'q42s': -1.1724735472, 'q43c': -1.3146772524, 'q43s': -0.23121743344, 'q44c': 1.278074295, 'q44s': 0.73371421822, 'q50': 1.1218634504, 'q51c': -2.2221116073, 'q51s': 1.0854260589, 'q52c': -0.70009846634, 'q52s': 1.5789483953, 'q53c': 2.0156858127, 'q53s': 0.35708799616, 'q54c': 1.2709617017, 'q54s': -0.46441127444, 'q55c': -1.2724101051, 'q55s': -0.93923967921}
Accessing all data from all files
There is a very quick way to obtain all raw data from all calculations in a PointsDirectory. The raw_data property can be used to obtain the raw data. This returns a Python dictionary where the keys are the point names and the values are a nested Python dictionary containing the results from all the relevant calculations.
[8]:
all_raw_data = points_dir.raw_data
all_raw_data
[8]:
{'WATER_MONOMER0000': {'gaussian_output': {'global_forces': {'O1': array([ 0.02953315, 0.0827204 , -0.02495305]),
'H2': array([ 0.00578961, -0.0242831 , -0.00842433]),
'H3': array([-0.03532276, -0.05843731, 0.03337739])},
'charge': 0,
'multiplicity': 1,
'molecular_dipole': MolecularDipole(x=0.1189, y=2.3866, z=0.0787),
'molecular_quadrupole': MolecularQuadrupole(xx=-6.5273, yy=-7.7674, zz=-6.2577, xy=0.0665, xz=-1.6318, yz=-0.0495),
'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=0.3235, yy=-0.9166, zz=0.5931, xy=0.0665, xz=-1.6318, yz=-0.0495),
'molecular_octupole': MolecularOctupole(xxx=0.5348, yyy=8.5805, zzz=0.2229, xyy=0.1794, xxy=3.083, xxz=0.0727, xzz=0.1817, yzz=3.0764, yyz=0.0143, xyz=0.0059),
'molecular_hexadecapole': MolecularHexadecapole(xxxx=-8.283, yyyy=-15.2596, zzzz=-8.1733, xxxy=-0.2961, xxxz=-0.1375, yyyx=-0.296, yyyz=0.056, zzzx=-0.1975, zzzy=0.0334, xxyy=-3.888, xxzz=-2.4818, yyzz=-3.8474, xxyz=-0.0404, yyxz=-0.2009, zzxy=-0.0859)},
'wfn': {'energy': -76.421710687455, 'virial_ratio': 2.01177209},
'ints': {'H2': {'iqa': -0.48880091691,
'integration_error': 1.8741005824e-05,
'q00': 0.55107276527,
'q10': -0.10046776424,
'q11c': 0.082404094273,
'q11s': -0.12293368007,
'q20': 0.0036460292977,
'q21c': 0.00029619225829,
'q21s': -0.0074877731368,
'q22c': 0.0074481701736,
'q22s': 0.0056874730269,
'q30': -0.05956450163,
'q31c': -0.039733101597,
'q31s': 0.041495598451,
'q32c': -0.028165432797,
'q32s': -0.11949807302,
'q33c': 0.058162645861,
'q33s': 0.018335018802,
'q40': -0.12002978326,
'q41c': 0.040444633273,
'q41s': -0.069505346729,
'q42c': -0.019427525486,
'q42s': -0.15727748843,
'q43c': 0.1932877549,
'q43s': 0.062934026281,
'q44c': -0.057209327496,
'q44s': 0.059969789986,
'q50': -0.011914817313,
'q51c': 0.062065103525,
'q51s': -0.076563819972,
'q52c': 0.048593580786,
'q52s': 0.026776257085,
'q53c': 0.072780829944,
'q53s': 0.028736420893,
'q54c': -0.04279361242,
'q54s': 0.10697549721,
'q55c': -0.031070190556,
'q55s': -0.026095070495},
'O1': {'iqa': -75.446714709,
'integration_error': -2.8725236559e-05,
'q00': -1.051921199,
'q10': -0.020042356378,
'q11c': 0.0018273275449,
'q11s': -0.20706556929,
'q20': -0.037266216811,
'q21c': -0.79780613831,
'q21s': 0.013146921148,
'q22c': -0.19595266005,
'q22s': 0.078488472227,
'q30': 0.043015515207,
'q31c': -0.053621704828,
'q31s': 0.21644282193,
'q32c': -0.029607236961,
'q32s': -0.89197505111,
'q33c': -0.053969314597,
'q33s': 0.16211677693,
'q40': -1.4545843935,
'q41c': 0.91783517331,
'q41s': 0.17650015949,
'q42c': -0.73112185714,
'q42s': -0.3293114897,
'q43c': 2.8344280941,
'q43s': -0.16267842746,
'q44c': -1.3853362266,
'q44s': 0.089771195512,
'q50': -0.24411738335,
'q51c': 0.48960856702,
'q51s': -1.5472642317,
'q52c': -0.040094542612,
'q52s': 0.98097072569,
'q53c': 0.72718022845,
'q53s': -1.1988409017,
'q54c': -0.47766441277,
'q54s': 2.0753064137,
'q55c': -0.29405113415,
'q55s': -1.6430303594},
'H3': {'iqa': -0.48619221042,
'integration_error': 1.7274446651e-05,
'q00': 0.50085300856,
'q10': 0.085197944712,
'q11c': -0.087841616442,
'q11s': -0.12029023455,
'q20': 0.00023993899792,
'q21c': -0.023661428009,
'q21s': -0.021391961736,
'q22c': -1.9131859436e-05,
'q22s': 0.02151432195,
'q30': 0.087599091989,
'q31c': 0.031324229347,
'q31s': 0.026427482381,
'q32c': 0.023804975755,
'q32s': -0.15674955727,
'q33c': -0.088059760394,
'q33s': 0.037288453803,
'q40': -0.07444947707,
'q41c': 0.05887992764,
'q41s': 0.081114651822,
'q42c': 0.0088380735081,
'q42s': 0.083371237428,
'q43c': 0.15535798009,
'q43s': -0.069909915593,
'q44c': -0.064026374739,
'q44s': -0.057177316644,
'q50': 0.0082484388953,
'q51c': 0.099441485367,
'q51s': 0.11360202978,
'q52c': -0.01832368283,
'q52s': -0.0010932889664,
'q53c': 0.15341606275,
'q53s': -0.10054868004,
'q54c': -0.17440062253,
'q54s': -0.0001046247894,
'q55c': 0.045251112263,
'q55s': 0.071037518757}}},
'WATER_MONOMER0001': {'gaussian_output': {'global_forces': {'O1': array([ 0.03848438, -0.02380376, 0.03412189]),
'H2': array([-0.03146119, 0.00036476, -0.00231774]),
'H3': array([-0.00702319, 0.02343899, -0.03180415])},
'charge': 0,
'multiplicity': 1,
'molecular_dipole': MolecularDipole(x=-0.8431, y=-1.323, z=1.7231),
'molecular_quadrupole': MolecularQuadrupole(xx=-5.1893, yy=-7.688, zz=-7.426, xy=-0.5634, xz=0.9119, yz=-0.3113),
'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=1.5784, yy=-0.9202, zz=-0.6582, xy=-0.5634, xz=0.9119, yz=-0.3113),
'molecular_octupole': MolecularOctupole(xxx=-2.948, yyy=-4.821, zzz=6.188, xyy=-0.9565, xxy=-1.5601, xxz=2.0371, xzz=-0.9166, yzz=-1.4649, yyz=2.004, xyz=-0.0833),
'molecular_hexadecapole': MolecularHexadecapole(xxxx=-7.6094, yyyy=-10.1144, zzzz=-11.4551, xxxy=-0.5969, xxxz=0.8455, yyyx=-0.6288, yyyz=1.3354, zzzx=0.9015, zzzy=1.1908, xxyy=-3.0317, xxzz=-3.1805, yyzz=-3.4563, xxyz=0.2984, yyxz=0.3128, zzxy=-0.2858)},
'wfn': {'energy': -76.429947804, 'virial_ratio': 2.00949207},
'ints': {'H2': {'iqa': -0.48887410664,
'integration_error': 9.5429453148e-06,
'q00': 0.57930929588,
'q10': -0.046102497364,
'q11c': 0.16558301986,
'q11s': 0.041594915198,
'q20': 0.010377317695,
'q21c': 0.018136684148,
'q21s': 0.006500049871,
'q22c': -0.02013508558,
'q22s': -0.015523513189,
'q30': -0.034744645637,
'q31c': 0.057390530241,
'q31s': 0.0065667570896,
'q32c': 0.039512071665,
'q32s': 0.020328858681,
'q33c': -0.076627866971,
'q33s': -0.050739679414,
'q40': 0.050244296949,
'q41c': 0.13100319212,
'q41s': 0.030144145474,
'q42c': -0.10131293908,
'q42s': -0.02287002106,
'q43c': -0.089114870694,
'q43s': -0.097196124547,
'q44c': 0.13556820311,
'q44s': 0.15270252132,
'q50': 0.099377074131,
'q51c': -0.0078247554538,
'q51s': 0.014794214546,
'q52c': -0.14821510785,
'q52s': -0.089844814373,
'q53c': 0.085049483662,
'q53s': 0.0066340841318,
'q54c': 0.055004363647,
'q54s': 0.19474995219,
'q55c': -0.090696179084,
'q55s': -0.19386856821},
'O1': {'iqa': -75.453164031,
'integration_error': -3.2412635167e-05,
'q00': -1.1248310833,
'q10': -0.15773618224,
'q11c': 0.081543820356,
'q11s': 0.12130191092,
'q20': -0.12606180318,
'q21c': 0.19139555969,
'q21s': -0.528487624,
'q22c': 0.54627966503,
'q22s': -0.078510757285,
'q30': -0.27889785637,
'q31c': 0.49213687845,
'q31s': -0.3074017347,
'q32c': 0.14764509944,
'q32s': -0.24727703379,
'q33c': -0.44944367966,
'q33s': -0.35435058603,
'q40': -0.48581728297,
'q41c': 1.5494543387,
'q41s': -0.23351555403,
'q42c': -0.86875233643,
'q42s': -1.853314432,
'q43c': -1.8063689406,
'q43s': -1.1998626773,
'q44c': 0.48732628951,
'q44s': 1.3828550511,
'q50': 0.4495412122,
'q51c': 0.31877779684,
'q51s': -0.024332972891,
'q52c': -1.476162947,
'q52s': -2.119742167,
'q53c': -1.5055751323,
'q53s': 0.18416072712,
'q54c': 0.45591672771,
'q54s': 2.1945711652,
'q55c': 0.03684630334,
'q55s': -1.3214124663},
'H3': {'iqa': -0.48791314597,
'integration_error': 1.767282812e-05,
'q00': 0.54551717948,
'q10': -0.12912268454,
'q11c': -0.073220513388,
'q11s': 0.093216336946,
'q20': -0.0041810639947,
'q21c': 0.0054837257387,
'q21s': 0.0035806223466,
'q22c': 0.0055876412653,
'q22s': -0.0037182632383,
'q30': -0.031536097089,
'q31c': 0.074389974368,
'q31s': -0.088388188785,
'q32c': -0.0070097113295,
'q32s': -0.1115211214,
'q33c': -0.033297232489,
'q33s': -0.032924675413,
'q40': -0.13030060961,
'q41c': 0.038148510124,
'q41s': -0.073690881766,
'q42c': -0.049562233205,
'q42s': -0.19895480526,
'q43c': -0.10771772308,
'q43s': -0.084626662835,
'q44c': -0.043351191843,
'q44s': 0.0023716485062,
'q50': -0.053242517161,
'q51c': 0.0053036087089,
'q51s': -0.008766321614,
'q52c': -0.098989030278,
'q52s': -0.043646219093,
'q53c': -0.058542351859,
'q53s': 0.030550952424,
'q54c': 0.01341571848,
'q54s': 0.014472605634,
'q55c': 0.020911770864,
'q55s': -0.019132623387}}},
'WATER_MONOMER0002': {'gaussian_output': {'global_forces': {'O1': array([-0.01182343, -0.00326165, 0.00243345]),
'H2': array([-0.00938744, 0.00801946, 0.01598048]),
'H3': array([ 0.02121088, -0.00475781, -0.01841394])},
'charge': 0,
'multiplicity': 1,
'molecular_dipole': MolecularDipole(x=-1.3324, y=-1.5029, z=-1.2291),
'molecular_quadrupole': MolecularQuadrupole(xx=-6.0559, yy=-7.6792, zz=-6.5966, xy=-0.2517, xz=-1.3516, yz=0.355),
'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=0.7214, yy=-0.9019, zz=0.1806, xy=-0.2517, xz=-1.3516, yz=0.355),
'molecular_octupole': MolecularOctupole(xxx=-4.7196, yyy=-5.3872, zzz=-4.3733, xyy=-1.5173, xxy=-1.7003, xxz=-1.4362, xzz=-1.5791, yzz=-1.6989, yyz=-1.3608, xyz=0.0836),
'molecular_hexadecapole': MolecularHexadecapole(xxxx=-9.1389, yyyy=-10.4563, zzzz=-9.1524, xxxy=-0.9871, xxxz=-1.2163, yyyx=-1.1784, yyyz=-0.9508, zzzx=-1.1644, zzzy=-0.7955, xxyy=-3.2977, xxzz=-3.0954, yyzz=-3.1475, xxyz=-0.3143, yyxz=-0.3329, zzxy=-0.4511)},
'wfn': {'energy': -76.430599107417, 'virial_ratio': 2.00850663},
'ints': {'H2': {'iqa': -0.4884303933,
'integration_error': 8.4456970171e-06,
'q00': 0.56704064421,
'q10': -0.016788810577,
'q11c': 0.16678532637,
'q11s': 0.05926427077,
'q20': 0.012378760753,
'q21c': -0.0011590886116,
'q21s': 0.0002814248579,
'q22c': -0.013873040688,
'q22s': -0.015668008236,
'q30': -0.026844098476,
'q31c': 0.070432957219,
'q31s': 0.018973897274,
'q32c': 0.035650388172,
'q32s': 0.015621862978,
'q33c': -0.062642395464,
'q33s': -0.086278367162,
'q40': 0.087362375529,
'q41c': 0.076220558505,
'q41s': 0.01615660549,
'q42c': -0.11754244046,
'q42s': -0.086770182195,
'q43c': -0.070653482872,
'q43s': -0.0463683192,
'q44c': 0.053438266326,
'q44s': 0.21580446061,
'q50': 0.026789945801,
'q51c': -0.086115785343,
'q51s': -0.037958011591,
'q52c': -0.039664935648,
'q52s': -0.013453543961,
'q53c': 0.053511966654,
'q53s': 0.13667812966,
'q54c': 0.074049196913,
'q54s': 0.03171448802,
'q55c': 0.053650772738,
'q55s': -0.19420111212},
'O1': {'iqa': -75.453749708,
'integration_error': -3.8823037245e-05,
'q00': -1.1395301182,
'q10': 0.1170316343,
'q11c': 0.12707113118,
'q11s': 0.14319062627,
'q20': 0.035370307734,
'q21c': -0.42415679932,
'q21s': 0.49791338765,
'q22c': 0.37603023988,
'q22s': 0.20445186908,
'q30': -0.15737553444,
'q31c': 0.45818443084,
'q31s': -0.35883958802,
'q32c': 0.3480880684,
'q32s': 0.20971032944,
'q33c': -0.29423125466,
'q33s': -0.34625176326,
'q40': 0.41096573978,
'q41c': 0.09307943713,
'q41s': 1.5410113059,
'q42c': -2.179742611,
'q42s': -1.5590824961,
'q43c': 0.35883654798,
'q43s': -0.88871603233,
'q44c': 0.43934126428,
'q44s': 1.7461938373,
'q50': 1.0996389779,
'q51c': -1.262699744,
'q51s': -1.3852909494,
'q52c': 1.560792331,
'q52s': 0.24419844412,
'q53c': -0.37471724152,
'q53s': 2.5132383079,
'q54c': -0.95964885508,
'q54s': -0.26895872429,
'q55c': 0.55394494551,
'q55s': -1.8680975711},
'H3': {'iqa': -0.48842559681,
'integration_error': 1.7432947248e-05,
'q00': 0.5724835067,
'q10': 0.1457603332,
'q11c': -0.028104713416,
'q11s': 0.09795249107,
'q20': -0.014236447809,
'q21c': 0.0020513538853,
'q21s': -0.024002553682,
'q22c': 0.0090167292986,
'q22s': 0.0018592788488,
'q30': -0.016278101055,
'q31c': 0.043378639139,
'q31s': -0.10104168631,
'q32c': 0.047377463605,
'q32s': 0.042668161752,
'q33c': -0.014828920829,
'q33s': 0.002301899411,
'q40': -0.070410778984,
'q41c': -0.074472064823,
'q41s': 0.18279123322,
'q42c': -0.1612131176,
'q42s': -0.12499918589,
'q43c': 0.06142026173,
'q43s': -0.037102021898,
'q44c': -0.0036973964853,
'q44s': 0.015807001231,
'q50': 0.16521543103,
'q51c': 0.025818622551,
'q51s': -0.070934786473,
'q52c': 0.20406825774,
'q52s': 0.12904411072,
'q53c': -0.064053836207,
'q53s': 0.092995258457,
'q54c': -0.019812495159,
'q54s': -0.014274832112,
'q55c': 0.013994317071,
'q55s': -0.0022039694548}}},
'WATER_MONOMER0003': {'gaussian_output': {'global_forces': {'O1': array([-0.01819327, 0.00455452, -0.01096517]),
'H2': array([-0.00188946, -0.00938049, 0.02132555]),
'H3': array([ 0.02008273, 0.00482597, -0.01036038])},
'charge': 0,
'multiplicity': 1,
'molecular_dipole': MolecularDipole(x=-2.0363, y=0.479, z=-1.157),
'molecular_quadrupole': MolecularQuadrupole(xx=-6.8842, yy=-7.4952, zz=-6.0587, xy=0.4805, xz=-1.0644, yz=-0.8024),
'traceless_molecular_quadrupole': TracelessMolecularQuadrupole(xx=-0.0715, yy=-0.6825, zz=0.754, xy=0.4805, xz=-1.0644, yz=-0.8024),
'molecular_octupole': MolecularOctupole(xxx=-7.1177, yyy=1.7805, zzz=-4.1497, xyy=-2.516, xxy=0.5301, xxz=-1.2734, xzz=-2.3914, yzz=0.5188, yyz=-1.4113, xyz=-0.0642),
'molecular_hexadecapole': MolecularHexadecapole(xxxx=-12.0314, yyyy=-8.1287, zzzz=-8.7818, xxxy=0.6728, xxxz=-1.5822, yyyx=0.6119, yyyz=0.2975, zzzx=-1.4365, zzzy=-0.0026, xxyy=-3.5051, xxzz=-3.5441, yyzz=-2.8023, xxyz=0.0089, yyxz=-0.4888, zzxy=0.1872)},
'wfn': {'energy': -76.42948849797, 'virial_ratio': 2.00883314},
'ints': {'H2': {'iqa': -0.4881027439,
'integration_error': 9.4522550832e-06,
'q00': 0.56326128376,
'q10': -0.038053718191,
'q11c': 0.17406452668,
'q11s': 0.019124181097,
'q20': 0.012081611559,
'q21c': 0.0051369484026,
'q21s': -0.00074088786746,
'q22c': -0.019407551417,
'q22s': -0.0027908684718,
'q30': -0.046075383748,
'q31c': 0.050014857932,
'q31s': 0.0094560487176,
'q32c': 0.058475130664,
'q32s': 0.024077433258,
'q33c': -0.097026056963,
'q33s': -0.03579136459,
'q40': 0.04154079856,
'q41c': 0.13187846251,
'q41s': 0.021412542719,
'q42c': -0.084391243749,
'q42s': -0.0200831628,
'q43c': -0.11848751779,
'q43s': -0.074415491028,
'q44c': 0.19465272929,
'q44s': 0.088507549312,
'q50': 0.058580091185,
'q51c': -0.042565949802,
'q51s': -0.0043775245542,
'q52c': -0.089474143135,
'q52s': -0.010408877679,
'q53c': 0.079998556562,
'q53s': 0.0045329002427,
'q54c': 0.07873109218,
'q54s': 0.091471346749,
'q55c': -0.17860562422,
'q55s': -0.077365814166},
'O1': {'iqa': -75.453284702,
'integration_error': -2.3962230701e-05,
'q00': -1.1261722464,
'q10': 0.11200768361,
'q11c': 0.19713726623,
'q11s': -0.046368083336,
'q20': 0.25960435715,
'q21c': -0.159889789,
'q21s': -0.51517986412,
'q22c': 0.49258123822,
'q22s': 0.087179927499,
'q30': -0.4214583255,
'q31c': 0.024210442018,
'q31s': 0.44866570921,
'q32c': 0.31042203091,
'q32s': 0.23745156258,
'q33c': -0.37812439877,
'q33s': -0.13095025897,
'q40': 0.88553589917,
'q41c': 1.9512466748,
'q41s': -1.4646293031,
'q42c': -1.3890087525,
'q42s': -1.1724735472,
'q43c': -1.3146772524,
'q43s': -0.23121743344,
'q44c': 1.278074295,
'q44s': 0.73371421822,
'q50': 1.1218634504,
'q51c': -2.2221116073,
'q51s': 1.0854260589,
'q52c': -0.70009846634,
'q52s': 1.5789483953,
'q53c': 2.0156858127,
'q53s': 0.35708799616,
'q54c': 1.2709617017,
'q54s': -0.46441127444,
'q55c': -1.2724101051,
'q55s': -0.93923967921},
'H3': {'iqa': -0.4880976683,
'integration_error': 1.85987186e-05,
'q00': 0.56290675093,
'q10': 0.16001384105,
'q11c': 0.04069948876,
'q11s': -0.069618583234,
'q20': -0.014716885498,
'q21c': -0.011767701802,
'q21s': 0.012518520553,
'q22c': 0.0019972173909,
'q22s': 0.0051377000688,
'q30': -0.065290261963,
'q31c': -0.043467453029,
'q31s': 0.1070757614,
'q32c': 0.024745592711,
'q32s': 0.036347573943,
'q33c': 0.000609416581,
'q33s': 0.00049392468072,
'q40': 0.051960788182,
'q41c': 0.096729266414,
'q41s': -0.23825631964,
'q42c': -0.083947282683,
'q42s': -0.12455375179,
'q43c': -0.029034203906,
'q43s': -0.010805163255,
'q44c': 0.0047750927805,
'q44s': -0.003890919399,
'q50': 0.052841082773,
'q51c': -0.068227481331,
'q51s': 0.17406623703,
'q52c': 0.082049793886,
'q52s': 0.13832399386,
'q53c': 0.060236017556,
'q53s': 0.055688530781,
'q54c': 0.011137162575,
'q54s': 0.0079359384347,
'q55c': -0.0097729922349,
'q55s': 0.010888114243}}}}
Converting to SQLite3 database
Reading thousands of files every time is very time consuming (especially on hard drives), so it is much more efficient to read the data once and store it in a database. ichor has SQLite3 support implemented, meaning a PointsDirecotry can be readily converted to an SQLite3 database. NOTE: ONLY RAW DATA FROM CALCULATIONS IS STORED IN THE DATABSE. NO POSTPROCESSING IS DONE. ANY POSTPROCESSING MUST BE DONE AT A LATER STEP (e.g. rotating multipole moments).
Code snipped to produce database:
from ichor.core.files import PointsDirectory
pd = PointsDirectory("points_directory_path")
pd.write_to_sqlite3_database()
Note 1: It takes a while to read all files, so this should be submitted on compute.
Note 2: If the dataset is large and split into many ``PointsDirectory``-like directories, then you can do
from ichor.core.files import PointsDirectory
from pathlib import Path
parent_dir = Path("parent_dir")
for d in parent_dir.iterdir():
pd = PointsDirectory("points_directory_path")
pd.write_to_sqlite3_database("large_database.db")
where all the information will be stored into one database.
SQLite Database Schema Diagram
The following is that the schema diagram looks like for the table currently. The image was made with DBVisualizer. Note that these all fields might not be populated if the database. That depends on the raw data that is present in the PointsDirectory. For example, if only Gaussian are ran, then the AIMAll-related data will be missing from the database.
Below is a diagram of the SQLite3 Database, made with DbVisualizer
Converting to JSON database
Very similarly, the PointsDirectory instance can be converted to a json database by
from ichor.core.files import PointsDirectory
pd = PointsDirectory("points_directory_path")
pd.write_to_json_database()
Generating CSV files with Features from SQLite3 Database
CSV files can be readily made from a PointsDirectory instance or a database. CSV files containing (ALF) features and relevant outputs can be generated from an SQLite3 database like so:
from ichor.core.database.sql.query_database import (
get_alf_from_first_db_geometry,
write_processed_data_for_atoms_parallel,
write_processed_data_for_atoms
)
db_path = "DATABASE_PATH"
# note that you can also define an ALF manually as well
# or get it from some other molecular geometry
# that contains the same atom sequencing as in the database
alf = get_alf_from_first_db_geometry(db_path)
# note that this will write files out in parallel
# use write_processed_data_for_atoms for serial
write_processed_data_for_atoms_parallel(
db_path,
alf,
ncores=4,
calc_multipoles=True, # rotates multipoles using C matrix
calc_forces=False, # calculates ALF forces using Wilson B matrix
)