Category Archives: computing

Different Basis Sets for Gaussian Calculations

There are multiple types of functionals and basis sets that can be used for different calculations in Gaussian such as optimizations, scans, and excited state energy calculations. A basis set is a set of basis functions. Each basis set is a different size and generally, the bigger the basis set size, the more accurate the results will be. The names of the basis sets accessible through Gaussian are 6-31G (which can include +,++, and different orbitals), STO-3G, 3-21G, 6-311G, cc-pVDZ, cc-pVTZ, cc-pVQZ, LanL2DZ, LanL2MB, SDD, DGDZVP, DGDZVP2, DGTZVP, GEN, and GENECP. However, really there are many more options available which are discussed more thoroughly on the following website (http://www.gaussian.com/g_tech/g_ur/m_basis_sets.htm). It is also possible to create your own basis set using Gaussian, but this can be time-consuming and complicated. In relation to 6-31G, the increasing size of the basis set in terms of +, ++, aug- (which are augmented basis sets)  and p,d,f orbitals or *,** (polarization functions), the more that are included, the more accurate results these should be as well. Each basis set contains a different number of Cartesian (etc) basis functions, which can be found in the output file (ctrl-f “basis function”). The larger the number of basis functions corresponds to a longer calculation time.

I performed an optimization calculation on a new conformation of tryptophan and then ran excited state calculations using 16 combinations of functionals (b3lyp, cam-b3lyp, pbepbe, and wb97xd) and basis sets (6-31G, 6-31+G, 6-31+G(d,p), and cc-pVDZ). Since 6-31G is the smallest basis set here, it took the shortest time to complete calculations in all of the functionals. Also, within functionals, cc-pVDZ is similar in time to 6-31G. Below is a table showing the times and number of basis functions for each basis set that was used in calculating excited state energies of an optimized configuration of tryptophan.

A) Basis set: 6-31G

Cartesian Basis functions: 159

Functional b3lyp cam-b3lyp PBEPBE wB97XD
Job CPU Time / Minutes 7.733 9.717 7.45 10.1

 

B) Basis set: 6-31+G

Cartesian Basis functions: 219

Functional b3lyp cam-b3lyp PBEPBE wB97XD
Job CPU Time / Minutes 25.88 34.63 20.5 34.65

 

C) Basis set: 6-31+G(d,p)

Cartesian Basis functions: 345

Functional b3lyp cam-b3lyp PBEPBE wB97XD
Job CPU Time / Minutes 59.53 76.0167 48.05 81.783

 

D) Basis set: cc-pvdz

Cartesian basis functions: 285

Functional b3lyp cam-b3lyp PBEPBE wB97XD
Job CPU Time / Minutes 29.0167 39.267 24.6 39.03

 

 

2 Comments

Filed under computing

Writing scripts to submit jobs on comet using the slurm queue manager

Comet is a huge cluster of thousands of computing nodes, and the queue manager software called “slurm” is what handles all the requests, directs each job to a specific node(s), and then lets you know when its done. In a prior post I showed the basic slurm commands to submit a job and check the queue.

You also need to write a special linux bash script that contains a bunch of slurm configurations, and also the linux commands to actually run your calculation.  This is easiest show by example, and I’ll show two:  one for a Gaussian job, and another for an AMBER job.


#!/bin/bash
#SBATCH -t 10:00:00
#SBATCH --job-name="gaussian"
#SBATCH --output="gaussian.%j.%N.out"
#SBATCH --partition=batch
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --export=ALL


nprocshared=1
jobfile=YOURINPUTFILE  #assumes you use a gjf extension (which is added below)
jobfile=$jobfile.gjf
outputfile=$jobfile.out

export GAUSS_SCRDIR=/scratch/$USER/$SLURM_JOBID
. /etc/profile.d/modules.sh
module load gaussian
exe=`which g09`
export OMP_NUM_THREADS=$nprocshared
/usr/bin/time $exe < $jobfile > $outputfile

If you copy this file exactly and then modify just the important parts, you can submit your own Gaussian jobs quickly.   But its worth knowing what these commands do.  All the SBATCH commands are slurm settings.  Most important is the line with the “t” flag which sets the wall time you think the job will require.  (“Wall time” means the same thing as “real time”, as if you were watching the clock on your wall.)  If your job winds up going longer than your set wall time, then slurm will automatically cancel your job even if it didn’t finish—so don’t underestimate.   The other important slurm flags are for “nodes” and “n-tasks-per-node”, which set how many nodes you want the job to use, and how many processors (cores) per node you want your job to use, respectively.  For Gaussian jobs, you always want to use nodes=1.  You can start with tasks-per-node=1, and then try and increase it up to 12 (the max for comet) to see if your calculation can take advantage of any of Gaussian’s parallel processing algorithms.  (You would also need to add a %nprocshared=8 line to the .gjf file.  It doesn’t always work that well, meaning your simply wasting processing time that we have to pay for.)

The commands at the bottom are bash linux commands that set up some important variables, including the stem of the filename for your Gaussian job file.  Eventually the script then loads the Gaussian module, and with the last line, submits your job.

If you’re using AMBER for molecular dynamics simulations, here’s a simple slurm script you can copy:

#!/bin/bash -l
#SBATCH -t 01:10:00
#SBATCH --job-name="amber"
#SBATCH --output="oamber.%j"
#SBATCH --error="eamber.%j"
#SBATCH --partition="compute"
#SBATCH --nnodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --no-requeue
#SBATCH --mail-type=END
#SBATCH --mail-user=username@skidmore.edu

module load amber

ibrun sander.MPI -np 1 -O -i equil4.in -p 2mlt.top -c 2mlt_equil3.crd -r 2mlt_equil4.crd -ref 2mlt_equil3.crd -x 2mlt_equil4.trj -o mdout

The SBATCH commands do the same job of configuring slurm, though the “mail-type” and “mail-user” options show how you can have slurm email you when a job finishes. In this example, the last ibrun command is the one that actually runs AMBER (a sander job) and sets up all of its settings, input files, output files, and other necessities. Those must be changed for each new job.

Here is some specific information and examples for comet at the San Diego Supercomputer Center, as well as a complete list of all the slurm options you can use.

Leave a Comment

Filed under computing

Using Putty and WinSCP to run molecular dynamics simulations on Influenza Neuraminidase 3BEQ

I first had to run a molecular dynamics simulation (via. workshop 1) to visualize the protein Influenza Neuraminidase 3BEQ. But before you can run a molecular dynamics simulation, you must know how to set up the simulation. Setting up the simulation and running the simulation wasn’t too bad. There is a learning curve when learning how to set up and run these simulations. All the errors I made when first trying to set up and run the simulation only helped me become better at the process.

 

I then had to run a molecular dynamics simulation (via workshop 2) to get the energy minimization and equilibration for the protein Influenza Neuraminidase 3BEQ. First time trying to run this simulation I encountered a road block, which was my own typos. Outside of the typos and making sure the input was perfect, I believe running these simulations was also not too bad. Errors also made me better at running this simulation.

 

Running these simulations through Putty introduced me to vim, which is a text editor. Vim allows you to edit a file in Putty. Learning Vim was not too bad also because of the Vim tutor that vim has. It teaches you the basics and everything I needed to know to successfully run the simulations.

 

The biggest thing I had to overcomer, for me, was to learn all these new features and remember how to use them. But once I surpassed the learning curve, It became progressively easier to get through the simulations.

1 Comment

Filed under computing

Python Script for Excited State Energy Calculations at Single-Point Geometries

I am in the process of writing a script in python to read Gaussian output files for single-point energy calculations and produce a text file of the energy values (ground state and six excited states) that can be opened in Excel. This will save me some time, since I am currently testing different combinations of functionals and basis sets with TDDFT in Gaussian, and I need to compare the results from many calculations to determine which combination is best. I am referencing a script written by Kristine Vorwerk. However, while Kristine’s script parses files with cclib and extracts the descriptions for energy values in addition to lambda max values, mine uses regular expressions and extracts only the energy values by themselves in an order I have predetermined. Like Kristine’s script, my script also extracts the method and basis set names and the job CPU time.

Another element of Kristine’s script that I have incorporated into my own is a brief interface on the command line that asks for the file path of the folder that contains the .out or .log files I want to read. The script will read every file of that type in that folder and create a text file in that same folder with all the results of interest. Unfortunately, testing the script on the command line can be a time-consuming and confusing process, since the command line itself does not show any error messages. To debug as I write, I am running my script through PyCharm, so I can see exactly where my code fails.

One of the biggest challenges of adapting this script from Kristine’s, and as a beginner programmer, I am unsure which functions requires cclib and which do not. As I move toward a finished script, I will rewrite many of her definitions and functions in a syntax that I am sure will work without cclib. Most of the adaptations rely of my knowledge of regular expressions. In particular, I am interested in ways that I could simplify parts of my code using loops. Since cclib uses simple functions to parse files, regular expressions should take more code to do the same job. However, I am finding that parts of my code look redundant and could probably be shortened using additional loops. For example, since I am looking for the same types of values for six excited states, my code has blocks such as:

mo1 = energyState1Regex.search(line)
if mo1 is not None:
    splitted = line.split()
    EEs1.extend(["  ", splitted[4]])
    absEEs1.extend(["   ", float(splitted[4]) + groundStateEV])

mo2 = energyState2Regex.search(line)
if mo2 is not None:
    splitted = line.split()
    EEs2.extend(["  ", splitted[4]])
    absEEs2.extend(["   ", float(splitted[4]) + groundStateEV])

mo3 = energyState3Regex.search(line)
if mo3 is not None:
    splitted = line.split()
    EEs3.extend(["  ", splitted[4]])
    absEEs3.extend(["   ", float(splitted[4]) + groundStateEV])

….

etc. within a loop, scanning each line for the regular expressions which indicate the different excited states. I could probably shorten this code with another loop, but I am still thinking about how to do that. I cannot use a loop to change one character in a variable name (e.g. mo1, mo2, etc.), so I may need to change the way my regular expressions search for the data I want. Kristine found a simple solution for that problem a while ago, but I believe that was for a list that could be indexed. Nonetheless, I may look at her latest script that uses regular expressions to see if she has found any ways to simplify the code I am writing, and whether that script may be a better reference for me to use.

 

cclib webpage

PyCharm webpage

2 Comments

Filed under chemistry, computing

Running a job on Comet – using the queue manager called SLURM

You tell comet to run your calculation by submitting it to a queue. Your calculation waits in line with all the other jobs scientists want to run on comet. The software that controls this queue is called slurm (that’s a really dumb name, but so it is).

The basics are that first you make (or edit) a bash script containing all the slurm settings and Linux commands you need.  Let’s say you call that file calculation.sh Then you can submit your job to the queue by typing:This ain't no cartoon

sbatch calculation.sh

If it works, you should see a short message with the job’s ID number.   Your job might not start right away if there are a lot of either jobs ahead of yours in line.

To view all the jobs in the queue, you can just type

squeue

To view only the jobs you’ve submitted, you can just type

squeue -u yourusername

If you realize — whoops! — you made a mistake and want to cancel the job you’ve just submitted, you can type

scancel JobIDNumber

When your job is done, it will silently disappear from the queue and your output files should be in your directory.  You can put a setting in your calculation.sh script to email you when your job finishes.  If something went wrong with the calculation, your output files should contain error messages to help you figure out what went wrong so you can fix it and resubmit the job.

That’s it for the basics.  There are some more useful slurm commands you can read about on the slurm official documentation page.  In another blog post, I’ll show you a simple slurm script you can copy, paste, and edit to get your own jobs running smoothly.

1 Comment

Filed under computing

Accessing Comet at SDSC

Don't let it hit youComet is the supercomputing cluster at the San Diego Supercomputing Center that we have been using to help us do calculations.  We have access to comet with support from XSEDE (thank you!)

Comet is accessed over the internet using a command-line interface on the server comet.sdsc.edu.  The basic program we use to access comet is called “ssh” (which is an acronym for “secure shell”).

On Mac OSX, you can go to Applications -> Utilities -> Terminal to open the command-line Linux interface on your mac.   To login to comet from your mac, in the Terminal program type: ssh wkennerl@comet.sdsc.edu Of course, use your own comet username in place of mine!

On Windows, there is no command-line Linux interface, so you need to use a separate program to connect to comet using ssh.  The classic program is called putty, and it is pre-installed on all Skidmore-owned Windows computers.  You may prefer to get a program with more features, for example, the Bitvise ssh client .  In either case, download and install a ssh program and tell it to access comet.sdsc.edu.

On either type of computer, after a few seconds, you will connect to comet and be prompted for your password.  After you type in your password, you will be looking at a Linux command line (specifically, it is a bash prompt) on a computer 3000 miles away from Saratoga Springs.  Cool, eh?

Now you can use comet for whatever calculation you need, using Linux commands, your input and output files (for gaussian or AMBER)

Leave a Comment

Filed under computing