Category Archives: programming

A More In-Depth Look at the hbond Action Command in CPPTRAJ

As I have now accrued much more experience with CPPTRAJ and AMBER16 in general, I can now confidently explain more in-depth the hbond Action Command. While I still won’t be talking about every parameter of the hbond command (seeing as they’ve been explained for the most part in a previous blog post), I will be explaining a few of the other parameters that I did not mention in a previous blog post. As a review, the syntax is as follows:

hbond [<dsname>] [out <filename>] [<mask>] [angle <acut>] [dist <dcut>]

[donormask <dmask> [donorhmask <dhmask>]] [avgout <filename>]

[solvout <filename>] [series [uvseries <filename>]]

As I’ve already explained what some of these parameters do in a previous post, I will not explain what they do, rather, I will be explaining the parameters that you, the reader, have yet to see in this series of blog posts.

  • [angle <acut>] refers to the angle cutoff for hydrogen bond formation, meaning whatever the angle is set to, any hydrogen bond that forms at an angle smaller than what was specified, will not be counted as an actual hydrogen bond (CPPTRAJ ignores it). If this is not specified, then the default angle is 135 degrees. One can disable the cutoff by specifying the angle as -1.
  • [dist <dcut>] refers to the distance cutoff for hydrogen bond formation, meaning whatever the distance is set to, a hydrogen bond that forms with a bond length greater than the set distance will not be counted as an actual hydrogen bond (CPPTRAJ ignores it). If this is not specified, then the default distance is 3.0 Angstroms. One cannot disable this cutoff.
  • [avgout <filename>] refers to the name of a file that will be outputted containing information about the average bond length, the average bond angle, and the number of hydrogen bonds that formed throughout the entire production run for residues in which hydrogen bond formation happened.

After learning about these parameters, I utilized them in my next production run, which yielded much more interesting results then the first time I used it. When I first used the hbond command, my results yielded nothing but 1’s and 0’s. That was because of the limiting default cutoffs. After disabling the angle cutoff and setting the distance cutoff to 3.5Å, my new data yielded much higher numbers. However, just because my data showed much better numbers, that doesn’t necessarily mean that many of those bonds can happen in real life. The reason those defaults are what they are is because those numbers are most likely the average lengths and angles in which hydrogen bonds actually form. Below are pictures of excerpts of the raw data, the average data, and a graph of the raw data of one of my production runs that utilized these new parameters. All of this data was obtained by doing a production run on the molecule, Melittin, and then by doing a trajectory run on the tryptophan of said Melittin. The exact syntax I used for the trajectory can be seen in the previous blog post (here is the link: http://williamkennerly.com/blog/a-mistake-with-the-hbond-action-command/)

 

 

 

 

1 Comment

Filed under chemistry, computing, programming

Python Error-Handling: Follow-up to Temporary Files

This post is a follow-up to this previous blog post.

On the new Windows 10 computers at Skidmore, I ran into the issue when running python in which temporary files disrupt the execution of my scripts. An example of a suspected temporary file is “C:\Users\jgerard\Desktop\Important2\~$LYP-D_root1.gjf.out”. However, when I select “View Hidden Files” in my directory, no such file appears. As a last resort, I have my script search for temporary files:

if "$" in file_name:

and break if True. This adjustment allowed the successful run of the script. However, it is less than elegant, and the script could run into issues if any of my output files were to include a “$” in their file name.

Leave a Comment

Filed under programming

Python Error Handling: Temporary Files and Skipping Failed Jobs

In my scripts that analyze all Gaussian output files in a directory, I use glob to get all the files into a list. The relevant script is as follows:

import glob

#get a folder to search through
folder_location = get_folder_location()

#gets filepaths for everything in the specified folder ending with .LOG or .out
search_str_log = folder_location + "*.LOG"
search_str_out = folder_location + "*.out"

file_paths_log = glob.glob(search_str_log)
file_paths_out = glob.glob(search_str_out)

file_paths = file_paths_log + file_paths_out

I have been editing and testing output files within the directory, and my script has mistakenly read their temporary Word files as .out files. I needed to make those temporary files visible (see this link for instructions on how to do that). Then I deleted the temporary files, so they would not be read as extra files.

 

I wanted my script to skip files that had in error termination in Gaussian, so I needed to search the output files for the string “Error termination”. Since I was reading output files with .readlines(), as a list of lines (strings), I decided to recombine the list of strings into one string as follows:

#reads the file as a list of its lines of text
content_lines = current_file.readlines()

combined = '\t'.join(content_lines)

if "Error termination" in combined:
    print file_name + " skipped because of error termination
    in Gaussian."

else:
    #analyze file

Another option would have been to use .read() in addition to .readlines() for a different variable content, but that would set the cursor to the end of the file, and I would need to write current_file.seek(0) to reset the cursor to the beginning of the file. (See this post for more information about this issue.) Using the .join() command was a simpler way of getting the text file as a string, without having to set the cursor back.

2 Comments

Filed under programming

Python: Error Handling

One part of my updates to my Python script has been to manage what the script does in the case of incomplete output files. I have developed ways of circumventing a couple of issues, which include missing or different nstates values and incomplete files.

To get the number of states, I use a function to find the route line, and then I use a regular expression to extract the nstates value. The function that I use to find the route line can be seen in a previous post.

I use a simple regular expression to find the nstates value from the route line: nstatesRegex = re.compile(r"nstates=(\d*)")

An example of a route line, which the find_parameters function extracts, is as follows: # td=(nstates=6) cam-b3lyp/6-31+g(d) geom=connectivity

I use nstatesRegex to get the nstates value from that line, but sometimes the nstates value does not appear in the route line. My script does not work properly if the nstates is not specified in the route line of an output file. For now, my code just prints a message specifying any files that do not include an nstates value in the route line.

If an nstates value is different from the other nstates values, I need to compensate for missing values in the list that contains all the relevant data. I first loop through all the files to get the largest_nstates value. In the second loop of all the files in which I append the relevant data to the master_results list, I do the following after I append the energy values:

for x in xrange(nstates, largest_nstates):
      EE_lst.append(" ")
      abs_EE_lst.append(" ")
      osc_lst.append(" ")

The empty spaces account for empty cells, where the output file expects energy values. I extend my master_results list with those three lists. The empty spaces help each of these lists be the same length as the largest lists I find in the directory, so that the lists are a uniform length and can be iterated through more easily.

Sometimes a user may have incomplete output files in the directory. An easy way to verify that an output file is complete is to check if it has a job time. This method is imperfect, since I have recently encountered an error in Gaussian itself of an incomplete file that nonetheless included its job time up until it crashed. Checking the job time will at least ensure that all files have run to completion, whether or not they failed in Gaussian. On the latest version, my script fails when it tries to read an incomplete file, but it will print a message that tells the user which files did not have job times, so that the user knows which files to remove from the directory.

There are several updates that I can make on this script to help it deal with incomplete files. I could have it skip incomplete files (such as those without job times) instead of just printing a message about which files are incomplete, so that the script still reads all the complete files. Although I have made progress in dealing with files that have different numbers of states, I could set the default to 3 states (if that is indeed the default), when nstates is not specified in the route line. I could also have my script check for Gaussian error messages and inform the user of the error message and the pertinent file. These updates are meant to help the user more quickly identify problems and spend less time looking at the script and various output files to determine what went wrong in a failed run.

1 Comment

Filed under programming