One part of my updates to my Python script has been to manage what the script does in the case of incomplete output files. I have developed ways of circumventing a couple of issues, which include missing or different nstates values and incomplete files.
To get the number of states, I use a function to find the route line, and then I use a regular expression to extract the nstates value. The function that I use to find the route line can be seen in a previous post.
I use a simple regular expression to find the nstates value from the route line:
nstatesRegex = re.compile(r"nstates=(\d*)")
An example of a route line, which the
find_parameters function extracts, is as follows: # td=(nstates=6) cam-b3lyp/6-31+g(d) geom=connectivity
nstatesRegex to get the nstates value from that line, but sometimes the nstates value does not appear in the route line. My script does not work properly if the nstates is not specified in the route line of an output file. For now, my code just prints a message specifying any files that do not include an nstates value in the route line.
If an nstates value is different from the other nstates values, I need to compensate for missing values in the list that contains all the relevant data. I first loop through all the files to get the
largest_nstates value. In the second loop of all the files in which I append the relevant data to the
master_results list, I do the following after I append the energy values:
for x in xrange(nstates, largest_nstates):
The empty spaces account for empty cells, where the output file expects energy values. I extend my
master_results list with those three lists. The empty spaces help each of these lists be the same length as the largest lists I find in the directory, so that the lists are a uniform length and can be iterated through more easily.
Sometimes a user may have incomplete output files in the directory. An easy way to verify that an output file is complete is to check if it has a job time. This method is imperfect, since I have recently encountered an error in Gaussian itself of an incomplete file that nonetheless included its job time up until it crashed. Checking the job time will at least ensure that all files have run to completion, whether or not they failed in Gaussian. On the latest version, my script fails when it tries to read an incomplete file, but it will print a message that tells the user which files did not have job times, so that the user knows which files to remove from the directory.
There are several updates that I can make on this script to help it deal with incomplete files. I could have it skip incomplete files (such as those without job times) instead of just printing a message about which files are incomplete, so that the script still reads all the complete files. Although I have made progress in dealing with files that have different numbers of states, I could set the default to 3 states (if that is indeed the default), when nstates is not specified in the route line. I could also have my script check for Gaussian error messages and inform the user of the error message and the pertinent file. These updates are meant to help the user more quickly identify problems and spend less time looking at the script and various output files to determine what went wrong in a failed run.