Category Archives: chemistry

A More In-Depth Look at the hbond Action Command in CPPTRAJ

As I have now accrued much more experience with CPPTRAJ and AMBER16 in general, I can now confidently explain more in-depth the hbond Action Command. While I still won’t be talking about every parameter of the hbond command (seeing as they’ve been explained for the most part in a previous blog post), I will be explaining a few of the other parameters that I did not mention in a previous blog post. As a review, the syntax is as follows:

hbond [<dsname>] [out <filename>] [<mask>] [angle <acut>] [dist <dcut>]

[donormask <dmask> [donorhmask <dhmask>]] [avgout <filename>]

[solvout <filename>] [series [uvseries <filename>]]

As I’ve already explained what some of these parameters do in a previous post, I will not explain what they do, rather, I will be explaining the parameters that you, the reader, have yet to see in this series of blog posts.

  • [angle <acut>] refers to the angle cutoff for hydrogen bond formation, meaning whatever the angle is set to, any hydrogen bond that forms at an angle smaller than what was specified, will not be counted as an actual hydrogen bond (CPPTRAJ ignores it). If this is not specified, then the default angle is 135 degrees. One can disable the cutoff by specifying the angle as -1.
  • [dist <dcut>] refers to the distance cutoff for hydrogen bond formation, meaning whatever the distance is set to, a hydrogen bond that forms with a bond length greater than the set distance will not be counted as an actual hydrogen bond (CPPTRAJ ignores it). If this is not specified, then the default distance is 3.0 Angstroms. One cannot disable this cutoff.
  • [avgout <filename>] refers to the name of a file that will be outputted containing information about the average bond length, the average bond angle, and the number of hydrogen bonds that formed throughout the entire production run for residues in which hydrogen bond formation happened.

After learning about these parameters, I utilized them in my next production run, which yielded much more interesting results then the first time I used it. When I first used the hbond command, my results yielded nothing but 1’s and 0’s. That was because of the limiting default cutoffs. After disabling the angle cutoff and setting the distance cutoff to 3.5Å, my new data yielded much higher numbers. However, just because my data showed much better numbers, that doesn’t necessarily mean that many of those bonds can happen in real life. The reason those defaults are what they are is because those numbers are most likely the average lengths and angles in which hydrogen bonds actually form. Below are pictures of excerpts of the raw data, the average data, and a graph of the raw data of one of my production runs that utilized these new parameters. All of this data was obtained by doing a production run on the molecule, Melittin, and then by doing a trajectory run on the tryptophan of said Melittin. The exact syntax I used for the trajectory can be seen in the previous blog post (here is the link: http://williamkennerly.com/blog/a-mistake-with-the-hbond-action-command/)

 

 

 

 

1 Comment

Filed under chemistry, computing, programming

A Mistake with the hbond Action Command

In a previous blog post, I talked about how the hbond action command in CPPTRAJ is capable of outputting a file that shows whether or not a hydrogen bond formed across a series of time steps as indicated by a 1 or a 0 (for clarification, click on this link: http://williamkennerly.com/blog/helpful-cpptraj-commands-part-2-action-commands/). I recently found out that this is not entirely true. When I first used CPPTRAJ’s hbond command, the data I obtained only showed 0’s and 1’s in the solute-solvent hydrogen bond column. At the time, I had made the assumption that these 0’s and 1’s simply indicated whether or not hydrogen bonds formed at a certain time step, 0 meaning no hydrogen bonds were formed and 1 meaning one or more hydrogen bonds were formed. The reason my data yielded strange results was because of two things: a default angle cutoff for bond formation set at 135 degrees, and a default distance cutoff set at 3Å. These cutoffs meant that if a bond was to form at an angle smaller than 135 degrees or at a distance greater than 3Å, then CPPTRAJ does not count it and eliminates the data for that bond entirely.

After changing the cutoffs (I eliminated the angle cutoff by specifying the angle to be -1, and changed the distance cutoff to be 3.5Å), I obtained very different data. The following is the exact syntax that I used in my .traj file to get my output:

parm mlt_wat.prmtop

trajin mdcrd/prod.mdcrd

hbond trpHBond :19 out trp.HBvTime2.dat angle -1 dist 3.5 image solventdonor     :WAT solventacceptor :WAT@O solvout trp.HBAvg2.dat

run

quit

This .traj file uses a .mdcrd file that contains the final result of a production run of a whole Melittin molecule; however, by specifying the residue number (:19) I only extracted information about the hydrogen bonds formed with the respective tryptophan atoms that form them (the two Nitrogens and the Oxygen), which when plotted results in the graph seen at the bottom of this post. The problem with this obtained data is that it is most likely not reliable. While looking at the average angles of the hydrogen bonds, it was discovered that the angles were very acute ranging from 62° to 106°. They also had slightly longer bond lengths then the original cutoff of 3Å, the lengths ranging from 3 to 3.2Å. Therefore, it can be assumed that these bonds are unlikely to happen in nature.

1 Comment

Filed under chemistry, computing

Helpful CPPTRAJ Commands Part 2: Action Commands

The next three commands that I found to be helpful are known as Action Commands. Unlike the Topology Commands from the previous post, these Action Commands allow for a file to be created that contains the output. These commands can be used both in CPPTRAJ’s interactive mode, and in .ptraj files that can be called as inputs when starting CPPTRAJ.

 

distance [<name>] mask1 mask2 [out <filename>]

This command outputs a file that gives the distance from the center of mass of mask1 to the center of mass of mask2 at each time frame (mask1 and mask2 can either be residues or atoms). The output file will show two columns, one indicating the time frame number, and the other indicating the distance between the specified masks in Angstroms. The following visual shows a sample output of the command being called on the CA and N specified atoms (Carbon and Nitrogen respectively).

The next visual is a graph of the first 500 distance values in Angstroms in a 5000 time frame production run.

This action command can utilize other key modifiers to get slightly different outputs; however, the ones listed above are the main modifiers that are necessary to get decent data. When running CPPTRAJ on my production data on Melittin, the modifiers listed above were the only ones that I used.

Example Usage: distance sample_name1 :19@CA :19@N out sample_data1.dat 

 

hbond [<name>] [out <filename>] [<mask>] [donormask <dmask> [donorhmask <dhmask>]] [solvout <filename>] [series [uvseries <filename>]]

This action command outputs a file that shows where there was hydrogen bonding in the molecule (or specified residue or atom). Some of these modifiers are not intuitively obvious and have yet to be seen in previous action commands, so below is the list of the modifiers that we have yet to identify in this post above and what they do.

  • [donormask <dmask> [donorhmask <dhmask]] refers to a specified residue or number of atoms that will be used as solute donor heavy atoms and a specified residue or number of atoms that will be used as solute donor hydrogen atoms respectively. The second mask should only be specified if the first mask is and the two masks should have a 1 to 1 correspondence between the two masks (in my case, one mask was specified as :WAT and the other was specified as :WAT@O, which represents the water box my Melittin was being simulated in).
  • [solvout <filename>] refers to the name of a file that will be outputted containing the averaged information of the solute-solvent hydrogen bonds in the specified [<mask>]. The output file will show the average distances and average angles of the hydrogen bonds formed between the acceptor and donor atoms (both of which are shown in the output file), along with the number of times a hydrogen bond formed and the fraction of the total number of hydrogen bonds that were specified by the [<mask>].
  • [series [uvseries <filename>]] refers to the name of a file that will be outputted containing the solute-solvent hydrogen bond time data in terms of whether a hydrogen bond was formed or not (as specified by a 1 meaning a hydrogen bond was formed, and a 0 meaning a hydrogen bond was not formed).

Normally, I would include sample data; however, the outputted data yielded very odd results that may or may not be completely useless, so I would much rather not give false data until I’m able to completely figure out how this command works. There are many other modifiers for this action command, but they turned out to be useless in my case.

Example Usage: hbond sample_name2 out sample_data2.dat solventdonor :WAT solventacceptor :WAT@O solvout sample_data3.dat series uvseries sample_data4.dat

 

rms/rmsd [<name>] <mask> [out <filename>]

This action command outputs a file that contains the RMS Deviation values of a specified <mask> at each time frame. The following visual shows a small part of a sample output of the RMS Deviation values of the backbone of Tryptophan in Melittin.

The next visual is a histogram of the 5000 RMS Deviation values that were outputted from using this command on the backbone of the Tryptophan in Melittin. The y-axis represents the fraction of RMS Deviation values that made up the data, while the x-axis represents the RMS Deviation values themselves.

Example Usage: rms sample_rms :19@C,CA,N out sample_rms.dat

Leave a Comment

Filed under chemistry, computing

Helpful CPPTRAJ Commands Part 1: Topology Commands

CPPTRAJ has a variety of commands for analyzing MD Simulation outputs, but because there are so many commands it would be very difficult to describe them all in detail in a single blogpost. As such, this first post will consist of the descriptions of three Topology Commands that I found to be useful for my research of the Tryptophan residue in Melittin. These Topology Commands print out Molecular Topology related information in CPPTRAJ’s interactive mode. For all intents and purposes, the visuals used in this blogpost will strictly be regarding Melittin.

Note: [<mask>] indicates a single residue/atom or range of residues/atoms

 

atominfo [<mask>]

This command prints out general information on each atom specified by the given [<mask>] modifier. The information is outputted into 12 columns. The following visual is an example output of the command being called on the 19th residue of a Melittin .mdcrd file.

  • #Atom refers to the atom’s index as given by Amber16
  • Name refers to the atom’s name identifier as given by Amber16
  • #Res refers to the residue number in which the atom is located
  • 2nd Name refers to the shorthand name of the residue in which the atom is located
  • #Mol refers to the atom’s molecule number
  • Type refers to the type of atom in the residue (i.e. alpha, beta, etc)
  • Charge refers to the electron charge of the atom
  • Mass refers to the mass of the atom
  • GBradius refers to the generalized Born radius of the atom
  • E1 refers to the element symbol
  • rVDW refers to the Van der Waal’s force radius of the atom
  • eVDW refers to the epsilon Van der Waal’s force of the atom

Example Usage: atominfo :19

 

bondinfo [<mask>]

This command prints out general information in the form of 6 columns about each bond between each atom as specified by the [<mask>] modifier. The following visual is an example output of the command being called on a specific carbon atom in the 19th residue of a Melittin .mdcrd file.

  • Bond refers to the bond index as specified by Amber16
  • Kb refers to the bond force constant
  • Req refers to the bond equilibrium value in Angstroms
  • atom names refers to the names of the bonded atoms as specified by Amber16
  • (numbers) refers to the atom indexes as specified by Amber16 as well as the types of atoms that are bonded together

Example Usage: bondinfo :19@CA

 

resinfo [<mask>]

This command prints out general information in 7 columns about a single residue or range of residues as specified by the given [<mask>] modifier. The following visual is an example output of this command being called on the range of residues that make up Melittin. The reason I had to indicate all the residues was because the .mdcrd file that I used had Melittin simulated in a water box. If the command was called without specifying the [<mask>] modifier, the output will include the residue info for all the water molecules in the water box as well. This goes for the atominfo and bondinfo commands too.

  • #Res refers to the residue index as specified by Amber16
  • Name refers to the shorthand name for each residue
  • First refers to the atom index of the first atom in the residue
  • Last refers to the atom index of the last atom in the residue
  • Natom refers to the number of atoms in the residue
  • #Orig refers to the original residue number in the original PDB file of which the date comes from
  • #Mol refers to the molecule number

One can also add the modifier, [short], to display the residues in the FASTA code sequence form. For Melittin, the sequence would look like this: GIGAVLKVLTTGLPALISWIKRKRQQ.

Example Usage: resinfo :1-26

 

Leave a Comment

Filed under chemistry, computing

Molecular Dynamics Simulation Process and AMBER

I learned the Molecular Dynamics Simulation process by dealing with the protein Influenza Neuraminidase 3BEQ. Molecular Dynamics Simulation is a multi-step process.

First, I had to visit the protein data bank to download the protein Influenza Neuraminidase and put in a directory on comet. Second, I had to setup the protein in tleap in AMBER. While I set up the protein in tleap, I had to add Ions to the protein and solvate the system with a water box. Third, you have to save the protein in a coordinate and topology file. Fourth,  I needed to run energy minimizations using sander in AMBER. Fifth, I then needed to run equilibration. Sixth, I had to run a molecular dynamics run. Seventh, I had to use cpptraj to run a trajectory analysis. Lastly, I used VMD to view the simulation.

The process is tedious and allows a lot of place for error and typos, which could be annoying. Aside for the errors that could take place, the molecular dynamics simulation process is not too bad to handle with guidance.

 

2 Comments

Filed under chemistry

How to Visualize Molecular Orbitals on GaussView

Open GaussView and click File->Open…, then open a checkpoint (.chk) files. After the GaussView image appears like below, click from the toolbar Results->Surfaces/Contours..

image (4)image (1)

A new window will open. Select the Cube Actions dropdown and click New Cube. Make sure that the top “Type” dropdown is set to Molecular Orbital. Here, you can choose what type of molecular orbital you would like to illustrate (under the HOMO dropdown). For example, the options include HOMO, LUMO, Occupied, Virtual, or you can choose your own combination or molecular orbitals by number. Once you choose, click OK. Then, click the Surface Actions dropdown and click New Surface. This will add the MO’s to your GaussView image.

Below are representations of the HOMO and LUMO molecular orbitals of one conformation of tryptophan.

image (2)image (3)

Leave a Comment

Filed under chemistry

Python Script for Excited State Energy Calculations at Single-Point Geometries

I am in the process of writing a script in python to read Gaussian output files for single-point energy calculations and produce a text file of the energy values (ground state and six excited states) that can be opened in Excel. This will save me some time, since I am currently testing different combinations of functionals and basis sets with TDDFT in Gaussian, and I need to compare the results from many calculations to determine which combination is best. I am referencing a script written by Kristine Vorwerk. However, while Kristine’s script parses files with cclib and extracts the descriptions for energy values in addition to lambda max values, mine uses regular expressions and extracts only the energy values by themselves in an order I have predetermined. Like Kristine’s script, my script also extracts the method and basis set names and the job CPU time.

Another element of Kristine’s script that I have incorporated into my own is a brief interface on the command line that asks for the file path of the folder that contains the .out or .log files I want to read. The script will read every file of that type in that folder and create a text file in that same folder with all the results of interest. Unfortunately, testing the script on the command line can be a time-consuming and confusing process, since the command line itself does not show any error messages. To debug as I write, I am running my script through PyCharm, so I can see exactly where my code fails.

One of the biggest challenges of adapting this script from Kristine’s, and as a beginner programmer, I am unsure which functions requires cclib and which do not. As I move toward a finished script, I will rewrite many of her definitions and functions in a syntax that I am sure will work without cclib. Most of the adaptations rely of my knowledge of regular expressions. In particular, I am interested in ways that I could simplify parts of my code using loops. Since cclib uses simple functions to parse files, regular expressions should take more code to do the same job. However, I am finding that parts of my code look redundant and could probably be shortened using additional loops. For example, since I am looking for the same types of values for six excited states, my code has blocks such as:

mo1 = energyState1Regex.search(line)
if mo1 is not None:
    splitted = line.split()
    EEs1.extend(["  ", splitted[4]])
    absEEs1.extend(["   ", float(splitted[4]) + groundStateEV])

mo2 = energyState2Regex.search(line)
if mo2 is not None:
    splitted = line.split()
    EEs2.extend(["  ", splitted[4]])
    absEEs2.extend(["   ", float(splitted[4]) + groundStateEV])

mo3 = energyState3Regex.search(line)
if mo3 is not None:
    splitted = line.split()
    EEs3.extend(["  ", splitted[4]])
    absEEs3.extend(["   ", float(splitted[4]) + groundStateEV])

….

etc. within a loop, scanning each line for the regular expressions which indicate the different excited states. I could probably shorten this code with another loop, but I am still thinking about how to do that. I cannot use a loop to change one character in a variable name (e.g. mo1, mo2, etc.), so I may need to change the way my regular expressions search for the data I want. Kristine found a simple solution for that problem a while ago, but I believe that was for a list that could be indexed. Nonetheless, I may look at her latest script that uses regular expressions to see if she has found any ways to simplify the code I am writing, and whether that script may be a better reference for me to use.

 

cclib webpage

PyCharm webpage

2 Comments

Filed under chemistry, computing

Trp Geometry Optimization Calculations with Gaussian

Optimization calculations are not too difficult. However, challenges can arise as can many different results. I did an optimization on three different conformations of tryptophan that I created by changing one of many dihedral angles in the molecule (Two C’s in pentagon across from N, the next Carbon in substituent chain, and the next Carbon which was bonded to NH2).

image

I used angles of 0, 135, and 180 degrees. Using the “broom”/”sweep” button on GaussView, I was able to return each configuration to a more energetically favorable conformation near those angles (21.99, -158.0, 41.5, respectively). However, they were not all the same since these angles are quite far from one another. When I performed an optimization calculation, each gave very similar energies (within 2 kJ/mol; in fact the first and third configuration were the same up to the fourth decimal place), but ranged in their change in energy from the “swept” conformation.

Here comes the issue. One calculation would not run (B3LYP/6-31G). It failed three times, each 15 minute long trials. The error message overall read “Error with lnk1e” followed by another line of error (which I do not remember). This suggested that the error arose because the “redundant internal coordinates” of this optimization were not working. So, one method to fix this issue would be to run the calculation starting from the current coordinates in the checkpoint file to redefine the internal coordinates and essentially bypass the error. Yet, in one of the output images, one C-C bond was essentially broken which shows that in its attempt to optimize the energy of tryptophan, GaussView felt that these two atoms needed to be further apart (about 1.75 Angstroms) than a normal C-C bond (1.54 A). This is not a scenario that should be repeated so the method of starting with new coordinates from the checkpoint file would not be effective. Therefore, instead we decided to use Cartesian coordinates which required adding “opt=cartesian” into the keywords of the route line.

Although a cartesian calculation takes much longer, I was able to find the proper energy of the optimized molecule.

 

1 Comment

Filed under chemistry

What you need to learn to do Computational Chemistry: everything Chris Cramer knows!

Doing computational chemistry requires (1) the technical know-how to use special software and computers and (2) the knowledge to understand what the software is telling you.

On that second point, there’s a lot of math and physics behind the theories of chemistry—and its great stuff! But before you dive into studying a lot of formal math and physics, its good to know what you’re working towards. Chris Cramer (a professor at U. Minnesota) is a computational chemist who has written a computational chemistry textbook and put a bunch of free video lectures up on Youtube that explain the concepts behind computational chemistry at a level that most students can understand. If you’ve taken general chemistry, an organic chemistry course, and a couple semester of calculus (multivariable calculus is great, but not absolutely necessary), these videos are great background and inspiration.  As you move through the video series (49 total — a full course!) the material gets rich, but if you follow along you should understand the spirit of it.

1 Comment

Filed under chemistry