I am in the process of writing a script in python to read Gaussian output files for single-point energy calculations and produce a text file of the energy values (ground state and six excited states) that can be opened in Excel. This will save me some time, since I am currently testing different combinations of functionals and basis sets with TDDFT in Gaussian, and I need to compare the results from many calculations to determine which combination is best. I am referencing a script written by Kristine Vorwerk. However, while Kristine’s script parses files with cclib and extracts the descriptions for energy values in addition to lambda max values, mine uses regular expressions and extracts only the energy values by themselves in an order I have predetermined. Like Kristine’s script, my script also extracts the method and basis set names and the job CPU time.
Another element of Kristine’s script that I have incorporated into my own is a brief interface on the command line that asks for the file path of the folder that contains the .out or .log files I want to read. The script will read every file of that type in that folder and create a text file in that same folder with all the results of interest. Unfortunately, testing the script on the command line can be a time-consuming and confusing process, since the command line itself does not show any error messages. To debug as I write, I am running my script through PyCharm, so I can see exactly where my code fails.
One of the biggest challenges of adapting this script from Kristine’s, and as a beginner programmer, I am unsure which functions requires cclib and which do not. As I move toward a finished script, I will rewrite many of her definitions and functions in a syntax that I am sure will work without cclib. Most of the adaptations rely of my knowledge of regular expressions. In particular, I am interested in ways that I could simplify parts of my code using loops. Since cclib uses simple functions to parse files, regular expressions should take more code to do the same job. However, I am finding that parts of my code look redundant and could probably be shortened using additional loops. For example, since I am looking for the same types of values for six excited states, my code has blocks such as:
mo1 = energyState1Regex.search(line) if mo1 is not None: splitted = line.split() EEs1.extend([" ", splitted[4]]) absEEs1.extend([" ", float(splitted[4]) + groundStateEV]) mo2 = energyState2Regex.search(line) if mo2 is not None: splitted = line.split() EEs2.extend([" ", splitted[4]]) absEEs2.extend([" ", float(splitted[4]) + groundStateEV]) mo3 = energyState3Regex.search(line) if mo3 is not None: splitted = line.split() EEs3.extend([" ", splitted[4]]) absEEs3.extend([" ", float(splitted[4]) + groundStateEV])
….
etc. within a loop, scanning each line for the regular expressions which indicate the different excited states. I could probably shorten this code with another loop, but I am still thinking about how to do that. I cannot use a loop to change one character in a variable name (e.g. mo1, mo2, etc.), so I may need to change the way my regular expressions search for the data I want. Kristine found a simple solution for that problem a while ago, but I believe that was for a list that could be indexed. Nonetheless, I may look at her latest script that uses regular expressions to see if she has found any ways to simplify the code I am writing, and whether that script may be a better reference for me to use.
Quick thoughts:
Title: python script is more specifically for excited-state energy calculations at a single (point) geometry
Testing combinations of basis sets AND functionals
Put a link in to cclib’s webpage (and when you need to know if a function is cclib or not, you should be able to find a reference to it (or not) in their documentation)
Put in a link to PyCharm, too
We can look at ways to refactor your code. Almost certainly you will want to make some lists of variables, rather than a bunch of individual variables. Lists are easy to loop over with python. For example, all of the strings you want to match that are basically all the same except for 1, 2, 3, 4, 5, 6 can all be put in one list. Same for the results–you don’t need a separate variable or list for EEs1, EEs2, etc. You can make a list or a list of lists if you need to.
Pingback: Python Single-Point Geometry Energy Script Update | Computational Chemistry at Skidmore College