Running the NewtonX Tests

First you need to install NewtonX.  To do this, you request a download link from the NewtonX site, follow the link, and download the binary file recommended (it will be called something like “NX-1.4.0-3-source+binary.tgz”).  Then copy the file to comet, and untar and unzip it by typing tar -zxf NX-1.4.0-3-source+binary.tgz.  This corresponds to section 9.1 in the documentation.  Do not go further and do 9.2; 9.2 is only for if you want to install the program manually.

After installing NewtonX, I spent a while trying to get the built in tests to run.  Eventually I came up with this procedure:

First we have to set up the variables so NewtonX can find Gaussian.  We do this through the following lines of code:

export NX=/home/kvorwerk/NX-1.4.0-2/bin
export g09root=/share/apps/compute/gaussian/09.E.01/g09
source $g09root/bsd/g09.profile
export g09root=/share/apps/compute/gaussian/09.E.01

The last line is necessary to avoid an error by which NewtonX adds an extra /g09/ into the path where it searches for gaussian.  Also note the line setting the NX variable would need to be modified to match the path where you installed NewtonX.

Next we make a directory to run the tests in.  This can be anywhere.  Then we select the tests to run by typing:

$NX/inp-testnx.pl

This shows a list of tests.  We want to run the gaussian09 one and the NewtonX ones; that is, 18, 19, 24, and 25.  To actually run the tests we type:

$NX/test-nx.pl >test.log

This creates a bunch of subdirectories and a file called test.log.  The test.log file contains basic output, and more output is in the sub directories.  When I ran the tests, the test.log file told me that:

18 finished, but with a message saying:

Files /home/kvorwerk/nx_test/TEST_NX/MD-SICH4-G09-TDDFT-NAD/RESULTS/dyn.out and   /home/kvorwerk/NX-1.4.0-2/bin/../test-nx/STANDARD-RESULTS/MD-SICH4-G0    9-TDDFT-NAD/RESULTS/dyn.out differ.  It seems to be important differences. Something may be wrong in the installation.

The files it refers to are easy to find.  However, I have been unable to find the file or script which compared them.

19 finished normally

24 finished with a message saying:

Files /home/kvorwerk/nx_test/TEST_NX/MD-ANALYTICAL-MODEL1/RESULTS/dyn.out and   /home/kvorwerk/NX-1.4.0-2/bin/../test-nx/STANDARD-RESULTS/MD-ANALYTIC    AL-MODEL1/RESULTS/dyn.out differ.  It seems to be only minor numerical differences. Probably installation is OK.

I do not think this error is worth worrying about.

25 finished normally.

The issue with test 18 may be due to us using a more recent version of gaussian (09.E) than the developers used to create the test.  Hopefully we will be able to find the comparisson they use and learn if this is indeed the case.

-Kristine

1 Comment

Filed under computing

Python Error-Handling: Follow-up to Temporary Files

This post is a follow-up to this previous blog post.

On the new Windows 10 computers at Skidmore, I ran into the issue when running python in which temporary files disrupt the execution of my scripts. An example of a suspected temporary file is “C:\Users\jgerard\Desktop\Important2\~$LYP-D_root1.gjf.out”. However, when I select “View Hidden Files” in my directory, no such file appears. As a last resort, I have my script search for temporary files:

if "$" in file_name:

and break if True. This adjustment allowed the successful run of the script. However, it is less than elegant, and the script could run into issues if any of my output files were to include a “$” in their file name.

Leave a Comment

Filed under programming

A Look at Various Methods of Computationally Computing Absorbance Spectra

Yesterday was my senior seminar presentation.  I chose a paper entitled Computing the Absorption and Emission Spectra of 5-Methylcytidine in Different Solvents, which examined the absorbance spectra — and therefore also the vertical excitation energies — of a molecule called 5-methylcytidine in different solvents.  The authors (Martinez-Fernandez et. al.) use a range of solvation models and a mix of QM/MM/MD methods to find their spectra, and it is a fairly interesting look at different ways to computationally solve the problem.

 

In the paper, the absorption and emission spectra of 5-methylcytidine in water, acetonitrile, and tetrahydrofuran are studied both experimentally and computationally.  This molecule is chosen because it is the nucleoside associated with 5-methylcytosine, a derivative of the C base linked to UV caused mutations in DNA1,2.  The nucleoside is used instead of 5-methylcytosine itself because the sugar ring affects the absorbance and thus must be included for the results to be accurate to those of living systems.

When studying 5-methylcytidine, the paper seeks to answer two main questions: how the molecule’s spectra can be most accurately modelled computationally, and what insights into the solvent’s effects on the spectra can be found.  To answer these questions, the results of three computational methods were compared to experimental spectra.  In the first, the molecule was modelled using a static quantum mechanical approach, with the solvent being treated using a PCM model.  In the second, a static mixed quantum mechanical/molecular modelling approach is used, where the solvent-molecule interaction is initially simulated and optimized to a low energy point using MM, and then the absorption/emission energies are found using QM.  In the third, a molecular dynamics simulation is run, from which a sample of the different conformations of molecule and solvent are chosen.  The absorption/emission energies are then found at each of these points using MM single point calculations.  In each case, the absorption spectra obtained were stick spectra.  Over each peak, a Gaussian was fit to simulate vibrational effects and provide complete spectra.  Note for the emission spectra, only the first two methods were used as the third was computationally infeasible.

The computationally obtained absorption spectra all successfully predicted the three main peaks shown in the experimental spectra at 278 nm, 242 nm, 200 nm, though in some cases the peaks were blue shifted due to the lack of vibrational effects in the calculations.  Of the three computational methods, the third was most successful at predicting the experimental results, as it closely matched the experimental and included some absorbance between the two higher wavelength bands which the other models missed.  This was thought to likely be due to the increased accuracy of this method at representing the solvent.  The spectra also mirrored the solvent shifts seen in the experimental data, with acetonitrile and tetrahydrofuran having for the most part similar effects on the absorbance.  Their spectra relative to that of water were red shifted for the band at 278 nm and of higher intensity but similar energy for the other bands.

The computational emission spectra were consistent with those found experimentally for water.  The spectra of the other two solvents were similar to each other both experimentally and computationally.  For both solvents, the shift in the spectra from that of water were overestimated using both computational methods.  It was concluded that the main errors in both absorption and emission spectra were due to the lack of accounting for vibrational effects.

The differences in the absorption spectra for the solvents were concluded to be due to the hydrogen bonding ability of the solvent.  The HOMO and LUMO of the molecule in acetonitrile and tetrahydrofuran were very similar in shape and energy, despite the difference in polarity3 between the solvents.  This accounted for their similar absorbance, and suggested that solvent polarity is not the main factor in the solvent effect.  In water, hydrogen bonding dramatically changed the shapes of the HOMO and LUMO orbitals, thus changing their energies and the wavelengths which could be absorbed.  Therefore, hydrogen bonding was concluded to be the main cause of the solvent effects observed.

 

The paper can be found at:

Matrinez-Fernandez, L.; Pepino, A.; Segarra-Marti, J.; Banyasz, A.; Garavelli, M, Importa, R. Computing the Absorption and Emission Spectra of 5-Methylcytidine in Different Solvents: A Test-Case for Different Solvation Models.  Journal of Chemical Theory and Computation.  http://pubs.acs.org/doi/abs/10.1021/acs.jctc.6b00518?journalCode=jctcce

 

-Kristine

1 Comment

Filed under Paper Review

Python Error Handling: Temporary Files and Skipping Failed Jobs

In my scripts that analyze all Gaussian output files in a directory, I use glob to get all the files into a list. The relevant script is as follows:

import glob

#get a folder to search through
folder_location = get_folder_location()

#gets filepaths for everything in the specified folder ending with .LOG or .out
search_str_log = folder_location + "*.LOG"
search_str_out = folder_location + "*.out"

file_paths_log = glob.glob(search_str_log)
file_paths_out = glob.glob(search_str_out)

file_paths = file_paths_log + file_paths_out

I have been editing and testing output files within the directory, and my script has mistakenly read their temporary Word files as .out files. I needed to make those temporary files visible (see this link for instructions on how to do that). Then I deleted the temporary files, so they would not be read as extra files.

 

I wanted my script to skip files that had in error termination in Gaussian, so I needed to search the output files for the string “Error termination”. Since I was reading output files with .readlines(), as a list of lines (strings), I decided to recombine the list of strings into one string as follows:

#reads the file as a list of its lines of text
content_lines = current_file.readlines()

combined = '\t'.join(content_lines)

if "Error termination" in combined:
    print file_name + " skipped because of error termination
    in Gaussian."

else:
    #analyze file

Another option would have been to use .read() in addition to .readlines() for a different variable content, but that would set the cursor to the end of the file, and I would need to write current_file.seek(0) to reset the cursor to the beginning of the file. (See this post for more information about this issue.) Using the .join() command was a simpler way of getting the text file as a string, without having to set the cursor back.

2 Comments

Filed under programming

MERCURY conference 2016 — a great time for students to experience computational chemistry!

Alexandra Cassell presenting her poster at MERCURY 2016

Alexandra Cassell presenting her poster at MERCURY 2016

Last week  Alex, Justin and I traveled (along with the Navea group) to the MERCURY conference at Bucknell University,
a conference devoted to computational chemistry done by undergraduates.  Its a a great opportunity to see the wide diversity of work that qualifies as ‘computational chemistry’, everything from really detailed studies of small molecules using high-level theory, to simulations of huge biomolecules with an aim towards discovering better drugs.   Not to mention mechanistic organic chemistry, applied materials research, and lots more.     Six speakers from research institutions and companies also come in to talk about their careers and research programs.   Its a not-to-miss conference every year!

Justin Gerard presenting his poster at MERCURY 2016

Justin Gerard presenting his poster at MERCURY 2016

Justin and Alex both presented results from their summer research on excited-state time-dependensity density functional theory calculations of indole and tryptophan (respectively).  Both spoke for a quick 1-minute abstract talk in front of the entire audience of 100+ attendees and then presented their poster for an hour during the poster session, speaking with other students, faculty at undergraduate colleges, and the six invited speakers.    Justin and Alex have posted trip reports.    Here are the vital pictures  — they both did a great job!

 

Leave a Comment

Filed under conference report, Group News

MERCURY: Guest Speakers and Poster Presentations

This past weekend, our research group attended talks by guest speakers, presented our abstracts, and fielded questions by our posters at the 2016 MERCURY Conference at Bucknell University.

There were six guest speakers: Chris Cramer, Steven Wheeler, Jeffrey Evanseck, Chris Wilmer, Richard Pastor, and Kate Holloway.

Cramer gave a crash course in quantum mechanics and computational chemistry. He referred to the psi operator in the Schrödinger equation as an oracle, and he said it is important to ask the oracle something you know before you ask it something you don’t know, i.e. to validate a computational method before using it to predict new information. He stated that the total energy of a molecule can be determined exclusively from its electron density. However, he pointed out self-interaction error as the biggest error in DFT, in which the calculation averages the position of one electron, so that in the calculation, the election repels itself (impossible). This error needs to be corrected for in the calculation. After the crash course, he discussed his research with MOFs (Metal Organic Frameworks), porous materials on the nano scale that can store and separate liquids or gases.

Wheeler delineated important goals and advice for computational chemists. He proposed that you only really learn how quantum chemistry methods work by programming them yourself, that you should learn a programming language (especially Python), and that you should take a course in linear algebra, if possible. Wheeler described three main types of functionals: semi-empirical (approximate/Hartree-Fock theory for 1000s of atoms), ab initio (MO-based methods for 10s to 100s of atoms, depending on the level of theory), and DFT (for 20 to 200 atoms). He asserted that choosing the appropriate level of theory for calculations is a primary focus of computational chemists. He mentioned his trials to find an appropriate functional with DFT for his research. Because traditional DFT functionals completely fail to capture dispersion interactions, he decided on B97-D, as a fast, accurate functional, though it had problems with long-range interactions. His research group studies the pi-pi* stacking of substituted benzene rings.

Evanseck introduced molecular dynamics. He stated three reasons for computation: to interpret experimental data, to extend (fill gaps) in experimental data, and to make predictions to guide the experiment. He made an analogy to the Pople Diagram (see below), with force field sophistication instead of level of theory. He then broke down the pros and cons on molecular dynamics methods vs. quantum methods. In addition, he pointed out that it is much easier to tweak the existing potential energy functions than to write new ones.

introduction-to-electron-correlation-32-638

Wilmer has become successful with a start-up called NuMat Technologies Inc., which develops new MOFs with programs that general hypothetical MOFs and calculates their ability to store and separate natural gases for motor vehicles and CO2 capture. He advanced with the help of some business majors by participating in business plan competitions.

Pastor informed us of the tremendous difficulty and computing time of membrane simulations, specifically looking at natural antibiotics in epithelial layers. He said it was appropriate for early grad students to be inexperienced and lazy (to learn but not disrupt the work of the professors or pursue fruitless projects), post-docs to be smart and hard-working (to effectively collaborate with professors), and PIs to be smart and lazy (to delegate tasks properly and be able to respond to post-docs and students). He encouraged students to seek truth, work within a community, and to be a good and generous friend by keeping friends out of the wrong quadrants (smart/inexperienced, hardworking/lazy), where inexperienced and hardworking was the most dangerous for everyone.

Holloway has worked on finding cures for AIDS and hepatitis C at Merck. She described her career trajectory, having decided on computational chemistry because she was good at it and didn’t like lab work. She became a computational chemist at Merck straight from her Ph.D. She emphasized the cost (>$2.5 billion) and waiting time (>10 years) for new drugs to get on the market. However, potentially saving people’s lives motivated her and her fellow researchers.

Presenting abstracts seemed like it would be easy enough, but I got some of the ordering of my spiel wrong and flubbed some of the wording. This experience has taught me that memorizing and practicing a short speech might be a good idea in the future.

Poster presentations went much better for me. Because we had rehearsed, I was comfortable with the flow of my presentation, and I adjusted the length and detail of it based on the knowledge of the person listening (anyone from Steven Wheeler to students with no knowledge of Gaussian). One student named Clorice was particularly interested in my poster, because she was also starting some excited state optimizations with TD-DFT. However, she did not experience the same difficulty optimizing first and second excited energy states, because her project involved pi-pi* stacking and enzyme activity, not a molecule with fluorescent properties similar to indole.

A highlight of the stay was hearing Chris Cramer expertly sing some folk songs at karaoke night (not to mention the our own group’s hoe-down of “Sweet Home Alabama”). That night, along with conversations with students and faculty in chemistry who are also involved in music, gave me hope that music and art will stay relevant to me if I pursue a career in chemistry.

1 Comment

Filed under conference report

MERCURY Conference

I really enjoyed going to this year’s MERCURY conference. In fact, a few of the speakers were very inspirational, as were some posters. The speakers that I most enjoyed were Kate Holloway and Chris Wilmer. Kate Holloway talked about drug design to combat HIV and HCV and I was really intrigued by the details and difficulties in this process. Chris Wilmer discussed MOF’s and how they can be used to model methane consumption. I think I liked these presentations the most, both because the speakers were very good and also because I could clearly see the applications of their research in our lives. I also really enjoyed the poster session. My presentation went well, except I was asked a few questions I was uncertain about and Chris Cramer didn’t come to talk to me 🙁 . However, I enjoyed looking at other posters more than presenting my own. There were three posters that I enjoyed the most. First, a student researched nano-aggregates and measured their speed of aggregation to be able to find a solution to clogging when these nano-aggregates interact with waxes in pipes. Her research seemed very simple on the poster, but the work was quite fascinating. Second, a student experimented with different halogens and leaving groups to see if he could make a bi-molecular reaction become a concerted reaction. I don’t think I realized how many different directions molecular dynamics could go; including organic chemistry. Lastly, two girls presented their poster about enzyme inhibitors. I can’t quite remember the entire purpose but I know that they were working with inhibitors, derivatives of some molecule (which included tryptophan), and dopamine. They measured interaction energies between the derivatives and dopamine to find the best one, with two different conformations.

I was disappointed that I wasn’t able to get any suggestions about Gaussian’s confusion problem from any of the quantum professors or speakers there who are familiar with the program. I would have liked to been able to discuss more with them the problems that we have been  having. Overall, I enjoyed seeing the subjects that other students were researching with computational chemistry. Also, the conference made me sad that I am not taking a science class next semester. . .

1 Comment

Filed under conference report

Errors Galore

I have been having to process and handle a wide variety of errors in the past few weeks, using Gaussian as well as Comet. Here are the conclusions we found if you ever run into similar errors.

  1. FileIO error:   This error was quite odd and took a lot of work to surpass. It was seen in at least three excited state optimization calculation output files but there was no other trend visible based on functional, basis set, or root number. This image is difficult to read but it essentially is a very long list of groups of numbers (like at the top), followed by “dumping /fiocom/ . . .  . . Ntrerr called from FileIO”. After communicating with Gaussian Tech Support, they offered the advice of updating our Gaussian to the E version. To do this, go to the gaussian.sh file and where it says “module load gaussian,” correct it to “module load gaussian.09.E.01” and save. This error was due to a bug in the older version that we had been using, although it could have been circumvented using “opt=nolinear” or “guess=always” as a keyword. For this reason, we have been using version E of Gaussian for all further calculations. FileIO
  2. Python plots: In the energy_step_oscillator_plot.py script, I had difficult writing to the results file. Whenever I would run the module, the correct plot would pop up but then the results file would be empty (blank and 0KB). To see the proper results, make sure to refresh the folder in which the results.txt file is and also close the plot before opening the results. This is necessary if and only if your script reads to “results.close()” AFTER the plot order. If you simply move “results.close()” to before the plot instructions, this will not be an issue. Now, the results.txt file will close before the plots are displayed.
  3. There have been multiple different cases where jobs on comet required more memory. There would be a message towards the end of the prematurely terminated output file asking for more MW, which is a different unit of memory storage. I have been adding “%Mem=8GB” to almost all of my input .gjf files (as the first line) to ensure that the job doesn’t end early.
  4. If a job terminates otherwise but there is no clear explanation, this probably indicates that there was not enough wall time to perform the calculation entirely, so you should add more wall time or increase the number of processors you use for the calculation. To do so, you must change “nprocshared=1” to “nprocshared=8” and “ntasks-per-node=1” to “ntasks-per-node=8” or any number between 1 and 8, and also include “%nprocshared=#” at the top of your input file. This can show a dramatic decrease in the time necessary to perform certain calculations.

1 Comment

Filed under computing

Charles presents at Skidmore Student/Faculty summer research meeting; thanks Charles!

Today a bunch of summer research students presented the results of their 5 weeks of work (for the first-half of the summer term).  Charles (math major, class of 2018) started off this summer working in our group; he has never taken a chemistry class and in just five weeks managed to learn how to run molecular dynamics simulations with AMBER (and updated and corrected some of our tutorials) and wrote some python and cpptraj scripts to aid in our analysis of melittin trajectories.  Charles has now had enough of pretending to be a chemist and is going back home for the summer.   The group will be building on your work next semester.  Thanks Charles!

 

charles1

Charles Clayton and his poster — “Structural Changes of Tryptophan in a Melittin Molecular Dynamics Simulation”

 

Not many people have the guts to present a science poster wearing a baseball cap. At least it wasn't the Red Sox.

Action shot — Charles handles the crowd around his poster!

Leave a Comment

Filed under Group News

Decoding: AMBER code

In a previous blog post, I presented a code I used in AMBER to run a trajectory of a protein. The code is below:

nohup mpirun -np 4 sander.MPI -O -i mdin -p 3beq.top -c 3beq.crd -r 3beq_md.crd -x md.trj -o md.out &

Before you can run code successfully, you have to know what the code means and how to set up the code. The code above contains flags and the files you need to run the simulation. The flags precede the files that contribute to the flags. The flags are the individual letters with the dash in front of them.

Flags:

  • -O: overwrites any output file with the same name
  • -i: input file
  • -p: parmtop file
  • -c: coordinate file
  • -r: restart file (output of new coordinates)
  • -x: trajectory file
  • -o: output file
  • -np: specifies the number of processors
  • -ref: reference file [Not in code above, but used in other codes]

Other than the flags and files, there are commands that also contribute to the running of the code. The command nohup and & are used. As previously discussed, nohup allows codes to run without hanging up and & allows codes to run in the background. So together, nohup and &, allows you to logout of comet or continue doing other things on comet while your job runs. A new command that is present in the code is mpirun, which cause the code to be executed in parallel.

1 Comment

Filed under computing