Saumyak Mukherjee

Tricks of the trade

[FORTRAN-90] PDB file format for FORTRAN coding

[FORTRAN-90] A generic model FORTRAN-90 program to read and analyze gro

[FORTRAN-90, GROMACS] Process/Analyze GROMACS xtc files in FORTRAN 90

[GROMACS] Calculating properties with dynamic indexing

[GROMACS] gmx select syntax

[GROMACS] Merge/Stitch two boxes in GROMACS

[GROMACS] Parameterize a small molecule with GAFF and run on gromacs

[LINUX] "convert" command in Linux

[LINUX] Create a video from multiple images in Linux

[LINUX] Compress pdf files from Linux terminal

[LINUX] Find files with a given "pattern" inside in a hierarchical manner including all sub-directories

[LINUX] parallel: Using all cores of a node in parallel

[LINUX] Run jupyter notebook on remote server using ssh tunnelling

[LINUX] Shorten remote login details

[MARTINI] Backmapping from MARTINI CG to AA

[PYTHON] Convert float variable to scientific notation

[PYTHON] Convert Python-2 scripts to Python-3 syntax

[PYTHON] Parsing command line arguments with flags

[PYTHON] Plotting graphs with matplotlib

[PYTHON, LATEX] Generate latex compatible tables from python

[SLURM] Building automated slurm submission pipeline with dependency

[VMD] Alignment of molecules

[VMD] Change default options in vmd

[VMD] Render frames as images after loading molecule in VMD

[VMD] Select and display whole water molecules within cutoff in VMD

[XMGRACE] Change default options in xmgrace

[FORTRAN-90] PDB file format for FORTRAN coding

---------------------------------------------------------------------------

Field | Column | FORTRAN |

No. | range | format | Description

---------------------------------------------------------------------------

1. | 1 - 6 | A6 | Record ID (eg ATOM, HETATM)

2. | 7 - 11 | I5 | Atom serial number

- | 12 - 12 | 1X | Blank

3. | 13 - 16 | A4 | Atom name (eg " CA " , " ND1")

4. | 17 - 17 | A1 | Alternative location code (if any)

5. | 18 - 20 | A3 | Standard 3-letter amino acid code for residue

- | 21 - 21 | 1X | Blank

6. | 22 - 22 | A1 | Chain identifier code

7. | 23 - 26 | I4 | Residue sequence number

8. | 27 - 27 | A1 | Insertion code (if any)

- | 28 - 30 | 3X | Blank

9. | 31 - 38 | F8.3 | Atom's x-coordinate

10. | 39 - 46 | F8.3 | Atom's y-coordinate

11. | 47 - 54 | F8.3 | Atom's z-coordinate

12. | 55 - 60 | F6.2 | Occupancy value for atom

13. | 61 - 66 | F6.2 | B-value (thermal factor)

- | 67 - 67 | 1X | Blank

14. | 68 - 70 | I3 | Footnote number

---------------------------------------------------------------------------

[FORTRAN-90] A generic model FORTRAN-90 program to read and analyze gro

!-------------------------------------

! Describe the function of the program

! Language: FORTRAN 90

!-------------------------------------

! Author: Dr. Saumyak Mukherjee

! Date :

!-------------------------------------

module error

implicit none

save

character(1024), parameter :: err1 = 'ERROR! Incorrect input format!'

character(1024), parameter :: err2 = 'executable <in-file.gro> <out-file> <btime(ps)> <etime(ps)> <dt(ps)> <npoint>'

end module error

program main

implicit none

integer , parameter :: ncode =

character(*) , parameter :: gro = '(i5, 2a5, i5, 3f8.3, 3f8.4)'

double precision, parameter ::

real :: start, finish, toe, t1, t2, time

integer :: i, j, k

integer :: n2, natom, nsol, npoint, nion

integer :: bstep, estep, nstep, nskip

integer :: resid, atomid

logical :: ex

character (5) :: res, atom

character (1024) :: fgro, fout

double precision :: btime, etime, dt

double precision :: lx, ly, lz

double precision :: rx, ry, rz, r

integer, dimension (ncode) :: code

double precision, allocatable, dimension (:,:) :: x, y, z

call system ("rm -f *.mod")

call cpu_time (start)

call timeDate

print*, '----------------------------'

print*, ' program function '

print*, '----------------------------'

print*, ''

call argChar (1, fgro )

call argChar (2, fout )

call argReal (3, btime )

call argReal (4, etime )

call argReal (5, dt )

call argInt (6, npoint)

inquire (file = trim (fgro), exist = ex)

if (ex .eqv. .false.) then

print*, ''

print*, 'ERROR! '//trim (fgro)//' does not exist in this directory!'

print*, ''

stop

endif

inquire (file = trim (fout), exist = ex)

if (ex .eqv. .true.) call system ("mv "//trim(fout)//" copy_"//trim(fout))

open (unit = 1 , file = trim (fgro), action = 'read' , status = 'old' )

open (unit = 11, file = trim (fout), action = 'write', status = 'unknown')

bstep = anint (btime / dt) + 1

estep = anint (etime / dt) + 1

nstep = estep - bstep + 1

nskip = bstep - 1

read (1,*) !

read (1,*) n2 ! READING THE NUMBER OF ATOMS/WATER MOLECULES

nsol = n2 / npoint ! FOR ONLY WATER IN THE SYSTEM

rewind (1) !

call groParams (1, natom, nsol, nion, npoint) ! READING SYSTEM PARAMETERS FOR PROTEINS SYSTEMS

print*, '--------------------------------------------'

print'(x,a33,x,i10)' , "Total number of atoms =", n2

print'(x,a33,x,i10)' , "Total number of protein atoms =", natom

print'(x,a33,x,i10)' , "Total number of water molecules =", nsol

print'(x,a33,x,i10)' , "Total number of ions =", nion

print'(x,a33,x,f10.3)', "Start time (ps) =", btime

print'(x,a33,x,f10.3)', "End time (ps) =", etime

print'(x,a33,x,f10.3)', "Time step (ps) =", dt

print'(x,a33,x,f10.3)', "Total calulation time (ps) =", (nstep-1)*dt

print'(x,a33,x,i10)' , "Start time step =", bstep

print'(x,a33,x,i10)' , "End time step =", estep

print'(x,a33,x,i10)' , "Number of steps to calculate =", nstep

print'(x,a33,x,i10)' , "Number of steps to skip =", nskip

print*, '--------------------------------------------'

allocate (x(), y(), z(), stat = code(1))

do i = 1,ncode

if (code(i) /= 0) then

print*, "ERROR IN MEMORY ALLOCATION!", i

print*, ""

stop

endif

enddo

if (bstep /= 1) then

print*, 'Skipping initial steps ...'

do k = 1, nskip

if (k == 1) call cpu_time (t1)

read (1,*)

read (1,*) n2

do i = 1, n2+1

read (1,*)

enddo

if (k == 1) then

call cpu_time (t2)

time = (t2 - t1) * float (nskip)

if (time >= 60.0 .and. time < 3600.0) then

time = time / 60.0

print"(x,a16,x,f7.2,x,a3)", "Estimated time =", time, "min"

elseif (time >= 3600.0) then

time = time / 3600.0

print"(x,a16,x,f7.2,x,a3)", "Estimated time =", time, "hrs"

else

print"(x,a16,x,f7.2,x,a3)", "Estimated time =", time, "sec"

endif

call progress (k,nskip)

enddo

endif

do k = 1, nstep

if (k == 1) call cpu_time (t1)

read (1,*)

do i = 1, natom

read (1,gro) resid, res, atom, atomid, x, y, z

enddo

do i = 1, nsol

do j = 1, npoint

read (1,gro) resid, res, atom, atomid, x, y, z

enddo

do i = 1, nion

read (1,gro) resid, res, atom, atomid, x, y, z

enddo

read (1,*) lx, ly, lz

rx = rx - lx * anint (rx / lx)

ry = ry - ly * anint (ry / ly)

rz = rz - lz * anint (rz / lz)

r = dsqrt (rx*rx + ry*ry + rz*rz)

if (k == 1) then

call cpu_time (t2)

time = (t2 - t1) * float (nstep)

if (time >= 60.0 .and. time < 3600.0) then

time = time / 60.0

print"(x,a16,x,f7.2,x,a3)", "Estimated time =", time, "min"

elseif (time >= 3600.0) then

time = time / 3600.0

print"(x,a16,x,f7.2,x,a3)", "Estimated time =", time, "hrs"

else

print"(x,a16,x,f7.2,x,a3)", "Estimated time =", time, "sec"

endif

call progress (k, nstep)

enddo

call cpu_time (finish)

toe = finish - start

print*, ''

if (toe >= 60.0 .and. toe < 3600.0) then

toe = toe / 60.0

print"(x,a28,x,f7.2,x,a3)","Time taken for computation =", toe, "min"

elseif (toe >= 3600.0) then

toe = toe / 3600.0

print"(x,a28,x,f7.2,x,a3)","Time taken for computation =", toe, "hrs"

else

print"(x,a28,x,f7.2,x,a3)","Time taken for computation =", toe, "sec"

endif

print*, ""

print*, "File ", trim (fout), " generated!"

end program main

subroutine progress(i, imax)

implicit none

integer :: i, imax

integer :: j, k

double precision :: prog

character(len=117) ::bar

bar="???% | |"

prog = dble (i) / dble(imax)

j = INT(prog * 100)

write(unit=bar(1:3),fmt="(i3)") j

do k=1, j

bar(6+k:6+k)=">"

end do

write(unit=6,fmt="(a1,a117)",advance="no") char(13), bar

if (i .lt. imax) then

flush(unit=6)

else

write(unit=6,fmt=*)

end if

return

end subroutine progress

subroutine argChar (n,file_name)

use error

implicit none

integer , intent(in) :: n

character(1024), intent(out) :: file_name

character(1024) :: arg

call getarg(n,arg)

if (len_trim(arg) == 0) then

print*, trim(err1)

print*, trim(err2)

stop

endif

read(arg,*) file_name

end subroutine argChar

subroutine argInt(n,val)

use error

implicit none

integer , intent(in) :: n

integer , intent(out) :: val

character(1024) :: arg

call getarg(n,arg)

if (len_trim(arg) == 0) then

print*, trim(err1)

print*, trim(err2)

stop

endif

read(arg,*) val

end subroutine argInt

subroutine argReal (n,val)

use error

implicit none

integer , intent(in) :: n

double precision, intent(out) :: val

character(1024) :: arg

call getarg(n,arg)

if (len_trim(arg) == 0) then

print*, trim(err1)

print*, trim(err2)

stop

endif

read(arg,*) val

end subroutine argReal

subroutine groParams (file_unit, atomnum, solnum, ionnum, npoint)

implicit none

character (*), parameter :: gro = "(i5, 2a5, i5, 3f8.3, 3f8.4)"

integer :: n2, i, resid

character (5) :: res

integer, intent (in) :: file_unit, npoint

integer, intent (out) :: atomnum, solnum, ionnum

read (file_unit,*)

read (file_unit,*) n2

atomnum = 0

solnum = 0

ionnum = 0

do i = 1, n2

read (file_unit,gro) resid, res

if (trim(res) == "SOL") then

solnum = solnum + 1

elseif (trim(res) == "CL" .or. trim(res) == "NA") then

ionnum = ionnum + 1

else

atomnum = atomnum + 1

endif

enddo

read (file_unit,*)

rewind (file_unit)

if (n2 /= atomnum + solnum + ionnum) then

print*, "ERROR! INCORRECT PARAMETER COUNT IN GRO FILE!"

print*, ""

stop

endif

solnum = solnum / npoint

end subroutine groParams

subroutine timeDate

implicit none

character(8) :: date

character(10) :: time

character(5) :: zone

integer :: hour, minute, second, day, month, year

integer, dimension(8) :: val

call date_and_time(date, time, zone, val)

hour = val(5)

minute = val(6)

second = val(7)

day = val(3)

month = val(2)

year = val(1)

print*,""

print"(x,a23,x,i2,a,i2,a,i2,a,x,i2,a,i2,a,i4)", &

"Calculation started at:",hour,":",minute,":",second,";", &

day,"/",month,"/",year

print*,""

end subroutine timeDate

[FORTRAN-90, GROMACS] Process/Analyze GROMACS xtc files in FORTRAN 90

(1) Install libxdrfile

(1.1) Download

wget https://github.com/wesbarnett/libxdrfile/archive/refs/heads/master.zip

(1.2) Install

unzip master.zip

cd libxdrfile-master

mkdir build

cd build

cmake .. -DCMAKE_INSTALL_PREFIX=installation_directory_path (by default /usr/local/)

make

make install

make test

(2) Install libgmxfort

(2.1) Download

wget https://github.com/wesbarnett/libgmxfort/archive/refs/heads/master.zip

(2.2) Install

unzip master.zip

cd libgmxfort-master

mkdir build

cd build

cmake .. -DCMAKE_INSTALL_PREFIX=installation_directory_path # (by default /usr/local/)

make

make test

make install

(3) Add the following line to ~/.basrc and compile.

LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path-to-LIBGMXFORT/LIBGMXFORT/libgmxfort-master/build/lib

(4) Sample code

Task: Read 1 frame from the file traj_PBC.xtc and print the coordinates of atom 1.

program main

use gmxfort_trajectory

implicit none

type (Trajectory) :: trj

real :: myatom(3)

integer :: n

call trj%open("traj_PBC.xtc", "index.ndx")

n = trj%read_next(1)

call trj%close()

myatom = trj%x(1, 1)

print*, myatom(1), myatom(2), myatom(3)

end program main

(5) Compilation

gfortran read_xtc.f90 -I /path-to-LIBGMXFORT/LIBGMXFORT/libgmxfort-master/build/include -L /path-to-LIBGMXFORT/LIBGMXFORT/libgmxfort-master/build/lib -lgmxfort -o readXTC

(6) Execution

./readXTC

(7) Output

Opened traj_PBC.xtc for reading.

48057 atoms present in system.

20001 frames present in trajectory file.

Frames saved: 1

Closed xtc file.

4.56200027 2.66300011 1.66200006

(8) Further information: https://github.com/wesbarnett/libgmxfort

N.B. Change "path-to-LIBGMXFORT" according to your destination directory!

[GROMACS] Calculating properties with dynamic indexing

There is no straight-forward way in GROMACS to calculate properties based on dynamic indexing, yet. To explain the concept, let us consider the total dipole moment of protein hydration layers.

The first step would be to get the index file for hydration layer water atoms

gmx select -f traj.xtc -s topol.tpr -on phl_0.3nm.ndx -select "same residue as name OW and within 0.3 of group protein"

This phl_0.3nm.ndx file has N number of groups where N is the number of time-frames in the trajectory file (traj.xtc). Each group contains the atom indices for the water molecules within 0.3 nm of the protein (aka, the hydration layer) at the respective time frame. Now, if you want to use this for calculating dipole moment using gmx dipoles, it will not work because you will be prompted to select one of the N groups, whose dipole moment trajectory will be calculated.

There are two ways to deal with the problem.

[GROMACS] gmx select syntax

To select water molecules within 0.3 nm from the surface of protein

gmx select -f traj.xtc -s topol.tpr -oi phl_0.3nm.dat -on phl_0.3nm.ndx -select "same residue as resname SOL and within 0.3 of group protein"

For detailed discussion, click on the following link: selections in GROMACS

[GROMACS] Merge/Stitch two boxes in GROMACS

Suppose you have two simulation boxes that you need to merge to create a larger box with an interface. You need to follow the following steps.

gmx editconf -f box-1.gro -o box-1.pdb

Let's say the boxes are to be stitched along the z-axis. Then you need to shift the coordinates of the atoms in box-2 by the z-length of box-1. x and y coordinates should not be changed.

gmx editconf -f box-2.gro -o box-2.pdb -translate 0 0 box-1_z

cat box-1.pdb box-2.pdb > final_box.pdb

Delete the few lines (4-6 lines) that mark the beginning of the second box in final_box.pdb. Then continue with the following.

gmx editconf -f final_box.pdb -o final_box.gro

gmx genconf -f final_box.gro -o final_box.gro -renumber

[GROMACS] Parameterize a small molecule with GAFF and run on gromacs

Suppose you want to simulate a small molecule in gromacs using GAFF. Youo have the pdb file molecule.pdb. Here are the steps for the process.

#1. If your pdb file does not have hydrogens, then first add H atoms using open babel:

obabel molecule.pdb -O molecule-H.pdb -h

Of course you may use other tools like VMD, PyMol, etc.

#2. Generate AMBER Parameters with Antechamber: Generate mol2 file.

antechamber -i molecule-H.pdb -fi pdb -o molecule.mol2 -fo mol2 -c bcc -s 2

Here charges are added using the AM1-BCC method, which is generally suitable for small organic molecules. For more details click here.

#3. Generate GAFF Force Field Parameter File using Parmchk2

parmchk2 -i molecule.mol2 -f mol2 -o molecule.frcmod

#4. Convert AMBER Parameters to GROMACS Format Using ACPYPE

acpype -i molecule.mol2 -b molecule

-b molecule is the base name for the output files. Click here for details.

Now you are ready to simulate the molecule in GROMACS.

[LINUX] "convert" command in Linux

Conversion between two image formats

convert input.jpg output.png

Making figures backgroundless

convert input.jpg -transparent white output.png

This example considers that the background color is white.

Trimming out white regions in a figure

convert -trim input.jpg output.jpg

For other useful options, type convert -help

[LINUX] Create a video from multiple images in Linux

Make a video by stitching image files using ffmpeg.

a) When the image files are not named sequentially.

ffmpeg -f image2 -framerate 10 -pattern_type glob -i "file_*.jpg" -s 1920x1080 -vcodec libx264 -crf 20 -pix_fmt yuv420p video.mp4

b) When the image files are named sequentially (file_001.jpg, file_002.jpg, ...).

ffmpeg -f image2 -framerate 10 -start_number 0 -i "file_%03d.jpg" -s 1920x1080 -vcodec libx264 -crf 20 -pix_fmt yuv420p video.mp4

[LINUX] Compress pdf files from Linux terminal

You can use ghostscript to do this. Install it by using the following command:

sudo apt install ghostscript

Now run the following command:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

-dPDFSETTINGS determines the output file size. Following are the options:

-dPDFSETTINGS=/screen — Low quality (72 dpi)
-dPDFSETTINGS=/ebook — Slightly better quality (150 dpi)
-dPDFSETTINGS=/prepress — High quality (300 dpi)
-dPDFSETTINGS=/default — The system chooses the best output

[LINUX] Find files with a given "pattern" inside in a hierarchical manner including all sub-directories

find . -type f -exec grep -iF "pattern" /dev/null {} +

[LINUX] parallel: Using all cores of a node in parallel

Suppose you need to run several jobs in a loop. For example, let's say you want to use gmx msd for 1000 groups. You can use a bash for loop to do that as follows.

for i in {1..1000}

do

echo ${i} | gmx msd -f traj.xtc -s topol.tpr -n index.ndx -o msd-${i}.xvg

done

In this case, each of the 1000 commands are executed one after the other. These are serial jobs. So, if your node has 16 cores, 1 core is used in one iteration. The rest 15 cores are idle. You can accelerate the process by using parallel as follows.

parallel 'echo {} | gmx msd -f traj.xtc -s topol.tpr -n index.ndx -o msd-{}.xvg' ::: $(seq 1 1000)

In this case, all the 16 cores will be used at a time. Which means 16 iterations will run simultaneously.

[LINUX] Run jupyter notebook on remote server using ssh tunnelling

Jupyter Lab runs in your browser, so you just need to forward its HTTP port (default 8888) from the server to your laptop.

SSH command

ssh -L 8888:localhost:8888 username@remote.server

-L local_port:host:host_port
Here: forward your local port 8888 --> remote’s localhost:8888.

On the remote server, start Jupyter Lab without opening a browser:

jupyter lab --no-browser --port=8888

You’ll see a URL like http://localhost:8888/?token=….

On your local machine, open a browser and go to

http://localhost:8888/

Paste the token if prompted. Voilà—your remote Jupyter lab is right in your browser.

Tip: if port 8888 is already in use locally, pick another free port, e.g.

ssh -L 9999:localhost:8888 username@remote.server.edu

and then browse to http://localhost:9999/.

[LINUX] Shorten remote login details

Suppose you need to login to a remote host named hostname.xxx.yyy.zzz for the user username. You can use ssh in the following way:

ssh -XC username@hostname.xxx.yyy.zzz

Or, if you need to send a file to this remote host, you may use scp in the following way:

scp -r filename username@hostname.xxx.yyy.zzz:/path/to/desired/directory/

However, it is annoying to write the full login details each time. To shorten it, add a file called config to the .ssh directory in home

vi ~/.ssh/config

and add the following lines:

Host hostname

HostName hostname.xxx.yyy.zzz

User username

Save this and next time use the following commands:

ssh -XC hostname

scp -r filename hostname:/path/to/desired/directory/

[MARTINI] Backmapping from MARTINI CG to AA

All related files/scripts/folders can be downloaded from here: link. Details about the usage are also given here.

Backmapping Martini coarse-grained structure to all-atom description

./initram-v5.sh -f initial.gro -o aa_backmapped.gro -to amber99sb -p topol_initial.top

Here, initial.gro is the CG input structure. aa_backmapped.gro is the backmapped output structure. topol_initial.top is the all atom topology file containing only protein atoms. Water and ions should not be included.

[PYTHON] Convert float variable to scientific notation

This function can be used to convert any floating number to *10^x notation in python codes.

(Thanks to Mr. Tobias Prass for collaboration)

def sci_not (inp):

superscript = {

"0": "\u2070",

"1": "\u00B9",

"2": "\u00B2",

"3": "\u00B3",

"4": "\u2074",

"5": "\u2075",

"6": "\u2076",

"7": "\u2077",

"8": "\u2078",

"9": "\u2079",

"-": "\u207B",

}

neg = False

if inp < 0:

neg = True

inp = inp * -1

num = inp < 1

if num:

sig = inp

exp = 0

while sig < 1:

sig = sig * 10

exp += 1

out = f"{sig:.2f} x 10"+superscript["-"]+superscript[str(exp)]

elif not num:

sig = inp

exp = 0

while sig > 10:

sig = sig / 10

exp += 1

out = f"{sig:.2f} x 10"+superscript[str(exp)]

if neg:

sign = "-"

elif not neg:

sign = ""

return str(sign)+out

[PYTHON] Convert Python-2 scripts to Python-3 syntax

Details in the following link: link

[PYTHON] Parsing command line arguments with flags

This is a python code to generate the input files necessary for running the DoSPT program that calculates entropy using the 2PT method. It uses the python module argparse to read in command line arguments using flags.

#!/usr/bin/env python

import argparse

import sys

import os

# Command line arguments parsed with flags

parser = argparse.ArgumentParser(description="This python scripts generates all the files necessary for running the DoSPT program", usage="%(prog)s -w <water-model> -ns <num. of steps> -t <traj. length> -T <temperature>", epilog="Copyright reserved by Saumyak Mukherjee")

parser.add_argument('-w' , '--water', type=str, help='Water model (spc/spce/tip3p/tip4p/tip4p-2005/tip4p-ice/disp) [default: %(default)s]', default='spce', metavar='')

parser.add_argument('-ns', '--nstep', type=int, help='Number of steps in the trajectory [default: %(default)s]', default=5001, metavar='')

parser.add_argument('-t' , '--trun', type=float, help='Total length of the trajectory in ps [default: %(default)s]', default=20.0, metavar='')

parser.add_argument('-T' , '--temp', type=float, help='Temperature of the system in K [default: %(default)s]', default=300.0, metavar='')

args = parser.parse_args()

wModel = args.water

nStep = args.nstep

tRun = args.trun

temp = args.temp

s = 2 # symmetry number

# Input and output files

fTraj = 'traj.gro'

fInput = 'input'

fMass = 'masses'

fGrp = 'groups'

fSgrp = 'supergroups'

# Check for the existence of the traj.gro file

if not os.path.exists(fTraj):

print(f"Error: The {fTraj} file does not exist in the current directory!")

sys.exit(1)

# Number of water molecules and box dimensions

nAtoms = {'spc':3, 'spce':3, 'tip3p':3, 'tip4p':4, 'tip4p-2005':4, 'tip4p-ice':4, 'disp':4}

nPoint = nAtoms[wModel]

boxLine = None

with open(fTraj, 'r') as inp:

for i, line in enumerate(inp):

if i==1:

nWat = int(line)

boxLine = nWat + 2

nWat = int(nWat / nPoint)

elif i==boxLine:

box = line.split()

lx, ly, lz = box[0], box[1], box[2]

break

fmt = fTraj.split('.')[-1]

# Generate the "masses" file

def write_mass(model, filename):

masses = {

'spc' : [('OW', 15.9994), ('HW1', 1.0080), ('HW2', 1.0080)],

'spce' : [('OW', 15.9994), ('HW1', 1.0080), ('HW2', 1.0080)],

'tip3p' : [('OW', 15.9994), ('HW1', 1.0080), ('HW2', 1.0080)],

'tip4p' : [('OW', 15.9994), ('HW1', 1.0080), ('HW2', 1.0080), ('MW', 0.0000)],

'tip4p-2005': [('OW', 15.9994), ('HW1', 1.0080), ('HW2', 1.0080), ('MW', 0.0000)],

'tip4p-ice' : [('OW', 15.9994), ('HW1', 1.0080), ('HW2', 1.0080), ('MW', 0.0000)],

'disp' : [('OW', 15.9994), ('HW2', 1.0080), ('HW3', 1.0080), ('MW4', 0.0000)]

}

if model in masses:

with open(filename, 'w') as f:

for atom, mass in masses[model]:

f.write(f'{atom} {mass}\n')

else:

print(f"Water model '{model}' not supported!")

write_mass(wModel, fMass)

print (f'file "{fMass}" generated"!')

# Generate "input" file

with open(fInput, 'w') as file:

file.write(f'points = {nStep}\n')

file.write(f'tau = {tRun:.3f}\n')

file.write(f'cell = {lx} {ly} {lz}\n')

file.write(f'temperature = {temp:.2f}\n')

file.write(f'format = {fmt}\n')

file.write('estimate_velocities = .true.\n')

print (f'file "{fInput}" generated"!')

# Generate "groups" file

with open(fGrp, 'w') as file:

file.write(f"{nWat*nPoint} {nWat}\n")

for i in range(1,nWat+1):

file.write(f"{nPoint} {s}\n")

atomnums = [str(nPoint*(i-1)+j) for j in range(1, nPoint+1)]

file.write(" ".join(atomnums)+"\n")

print (f'file "{fGrp}" generated"!')

# Generate "supergroups" file

with open(fSgrp, 'w') as file:

file.write(f'1-{nWat}')

print (f'file "{fSgrp}" generated"!')

To know about the usage of the code, type in

setupDoSPT.py -h

The output should look like the following:

usage: setupDoSPT.py -w <water-model> -ns <num. of steps> -t <traj. length> -T <temperature>

This python scripts generates all the files necessary for running the DoSPT program

optional arguments:

-h, --help show this help message and exit

-w , --water Water model (spc/spce/tip3p/tip4p/tip4p-2005/tip4p-ice/disp) [default: spce]

-ns , --nstep Number of steps in the trajectory [default: 5001]

-t , --trun Total length of the trajectory in ps [default: 20.0]

-T , --temp Temperature of the system in K [default: 300.0]

Copyright reserved by Saumyak Mukherjee

For details on the DoSPT program follow this link: DoSPT

[PYTHON] Plotting graphs with matplotlib

These are some generic templates that I often have to use. This of course is not exhaustive and needs to be adjusted or modified according to requirement. One may think of this as a basic template.

1. Useful packages/modules to be imported:

%matplotlib inline

import numpy as np # array manipulations

import matplotlib.pyplot as plt # Data plotting

import matplotlib as mpl # Matplotlib

import pandas as pd # I/O handling and much more

from scipy import interpolate # Interpolation or spline fitting

from scipy import optimize as op # Fitting data with a function

from string import ascii_lowercase as alc # Alphabets in alphabetical order in a loop

import matplotlib.patches as patches # Introduce geometric shapes in graphs

from mpl_toolkits.axes_grid1.inset_locator import (inset_axes, InsetPosition, mark_inset) # Inset

import os # For using bash commands in python

import glob # for file handling

import seaborn as sns # Plotting continuous distribution

Some initial lines:

warnings.filterwarnings('ignore') # To not show warnings

mpl.rcParams['axes.linewidth'] = 0.5 # Width of the graph frame

2. Reading in data from a file:

2.1. Using numpy for a single input file

file = "input-file.dat" # Let's say the file has two columns separated by space

data = np.loadtxt(file, unpack=True, skiprows=17)

x = data[0] # as numpy array

y = data[1] # as numpy array

2.2. Using numpy for multiple input files with sequential numbering (let's say 10, 15, 20, ..., 95, 100)

for i in range(10, 100, 5):

file = f"input-file_{i}.dat"

# or, an alternative way

file = "input-file_"+str(i)+".dat"

data = np.loadtxt(file, unpack=True, skiprows=17)

x = data[0] # as numpy array

y = data[1] # as numpy array

2.3. Using numpy for multiple input files with non-sequential numbering

files = glob.glob("input-file_*.xvg")

for file in files:

data = np.loadtxt(file, unpack=True, skiprows=17)

x = data[0] # as numpy array

y = data[1] # as numpy array

2.4. Using python (same method for multiple files as before)

file = "input-file.dat"

data = []

with open(file, "r") as file:

for line in file:

if line.startswith("#") or line.startswith("@"): # Skipping comments

continue

line = line.split()

x, y = float(line[0]), float(line[1])

data.append((x, y))

3. Plotting the data:

3.1. Using matplotlib.pyplot.subplots for a single graph from a single input file.

fig, ax = plt.subplots(1, 1, dpi=300, figsize=(2.5,2), constrained_layout=True)

# plot data

ax.plot(x, y, "--o", lw=0.8, ms=1, color='k', label="xxx")

# plot data along with errorbar

ax.errorbar(x, y, yerr=dy, #dy to be read from input file or calculated

color = 'k',

ecolor='k',

elinewidth=0.5,

lw = 0.8,

label='Mean',

marker = '.')

# plot histogram

ax.hist(y, bins=100,

density=True,

color='dodgerblue',

edgecolor='dodgerblue',

lw=0.3,

alpha=0.4)

# plot continuous distribution

sns.kdeplot(y, linewidth=1, fill=True)

# plot bar-plot

ax.bar(x, y, label="bar label", color='b', edgecolor='k', lw = 0.5, width = 0.5)

ax.set_xlim(0, 100)

ax.set_ylim(0, 1)

ax.set_xlabel('x-axis label', fontsize=10)

ax.set_ylabel('y-axis label', fontsize=10)

ax.tick_params(axis='both', which='major', labelsize=6, length=1.5, width=0.5)

ax.ticklabel_format(style='sci', axis='y', scilimits=(4,4), useMathText=True)

ax.yaxis.offsetText.set_fontsize(6)

ax.yaxis.set_label_coords(-0.2, 1.2) # adjust y-axis label position

ax.grid(visible=True, which='major', axis='both', ls='--', lw=0.2)

ax.set_title("Graph title", fontsize=8, pad=5, y=1.0)

ax.legend(frameon=False, loc=1, fontsize=10, shadow = False, edgecolor='k')

ax.axvline(x=50, color='r', ls="-", lw=0.8)

ax.axhline(y=0, color='k', ls='--', lw=0.8)

ax.text(50,0.5, "Text to be added", fontsize=5, color='b', rotation='vertical')

ax.spines['top'].set_visible(False) # do not show the top line in graph frame

ax.spines['right'].set_visible(False) # do not show the right line in graph frame

plt.setp(ax.get_xticklabels(), visible=False) # do not show the x-ticks

ax.arrow(25,790,0,10, head_width=15, head_length=5, fc='r', ec='r') # Draw downward arrow

ax.arrow(420,780,0,-10, head_width=15, head_length=5, fc='b', ec='b') # Draw upward arrow

ax.fill_between(x, y, 0, color='r', alpha=0.6) # Fill area under a curve with color

# add inset

left, bottom, width, height = [0.6, 0.65, 0.4, 0.35]

ax1 = ax.inset_axes([left, bottom, width, height])

ax1.plot(x, y, lw=0.6, ms=1, "--s", color='r')

loc = [0, 0.2, 0.5, 0.8, 1.0] # x-ticks at given positions

ax1.set_xticks(loc)

ax1.set_ylim([0.2, 0.4])

ax1.set_xlim([0, 1])

ax1.set_facecolor("lavender")

rect = patches.Rectangle((0, 0), 50, 0.8, lw=0, edgecolor='w', facecolor='w') # add a rectangle

ax1.add_patch(rect)

figFile = 'output.jpg'

fig.savefig(figFile,format="jpg", dpi=300, bbox_inches='tight')

os.popen(f'convert -trim {figFile} {figFile}')

3.2. Using matplotlib.pyplot.subplots for a multi-panel graph from multiple input files.

fig, axs = plt.subplots(3,2,dpi=300,figsize=(10,12),constrained_layout=True,sharex=True,sharey=True)

for ax, x, y, ser in zip(axs.flatten(), xVals, yVals, alc): # xVals and yVals are arrays

ax.plot(x, y, lw=0.6, ms=1, "--s", color='r', label=ser) # ser gives the panel numbering from alc

add other options as shown before

4. Fitting the data using scipy.optimize

# triexponential fitting

def func (x, a1, t1, a2, t2, a3, t3):

return a1*np.exp(-x/t1) + a2*np.exp(-x/t2) + a3*np.exp(-x/t3) + (1-a1-a2-a3)

guess = [0.5,1,0.5,10,0.5,100]

lo = (0,0,0,0,0,0)

hi = (1,10,1,100,1,1000)

maxiter = 1000001

popt, pcov = op.curve_fit(func, x, y, check_finite=True, p0=guess, bounds=(lo,hi), maxfev = maxiter)

ax.plot(x, y, 'ro', lw=1, label='raw data')

ax.plot(x, func(x, *popt), 'r-', lw = 0.8, , label='fitted data')

[PYTHON, LATEX] Generate latex compatible tables from python

You can generate Latex formatted tables from python using pandas. Suppose you have a file input.txt with three columns which you want to put into a latex document. This is a sample piece of code to do that. As a bonus this code also shows how to put multiple index names in a pandas DataFrame.

import numpy as np

import pandas as pd

vals = np.loadtxt (input.txt)

head = [('Index-1a','Index-1b'), ('Index-2a','Index-2b'), ('Index-3a','Index-3b')]

form = ["{:.0f}".format, "{:.0f}".format, "{:.0f}".format] # all integers

cols = pd.MultiIndex.from_tuples(head)

df = pd.DataFrame(vals, columns=cols)

out = df.to_latex(

index=False,

caption='Table caption.',

escape=False,

formatters=form,

multirow=True,

column_format = 'lrr',

label='latex_label')

print(out)

For details click on the following link.

[SLURM] Building automated slurm submission pipeline with dependency

When you have a defined wall-time on the cluster but your job requires more than that time to finish, it is often wise to break you job into chunks and automate their consecutive submissions on slurm. For example, let's say the wall-time limit is 24 hrs and your simulation job (say 1 microsecond) requires more than 24 hrs to complete. In that case, you may divide your job into ten 100 ns simulations, each of which require less than 24 hrs to finish. This means that after the 1st job finishes, the 2nd job will pick up from the structure (GRO) and checkpoint (CPT) (as in GROMACS) and so on for the next jobs. This process can be automated using slurm dependency in you submission script. The following example script creates directories SIM-1…SIM-N and a submit.sh in each that:

- Runs grompp using the previous segment’s prod.gro/.cpt

- Submits the next job as a dependency (afterok)

#!/usr/bin/env bash

set -euo pipefail # For robust error handling

# ──────────────────────────────────────────────────────────────────────────────

# User‐configurable parameters

# ──────────────────────────────────────────────────────────────────────────────

NUM_SIM=5 # total number of segments you want (SIM-1…SIM-$NUM_SIM)

INITIAL_GRO="start.gro" # for SIM-1: initial coordinate file in top‐level dir

INITIAL_CPT="start.cpt" # for SIM-1: initial checkpoint file in top‐level dir

MDP_FILE="../prod.mdp"

TOPOL_FILE="../topol.top"

# ──────────────────────────────────────────────────────────────────────────────

# Loop over each segment index

# ──────────────────────────────────────────────────────────────────────────────

for (( i=1; i<=NUM_SIM; i++ )); do

sim_dir="SIM-${i}"

mkdir -p "${sim_dir}"

# compute indices for chaining

prev=$(( i - 1 ))

next=$(( i + 1 ))

# path to the “previous” gro/cpt:

if (( i == 1 )); then

# first segment uses your initial files

gro_input="${INITIAL_GRO}"

cpt_input="${INITIAL_CPT}"

else

gro_input="../SIM-${prev}/prod.gro"

cpt_input="../SIM-${prev}/prod.cpt"

fi

# build the submit.sh in SIM-i

cat > "${sim_dir}/submit.sh" << EOF

#!/bin/bash -l

#

# SBATCH directives

#SBATCH -J JOBNAME # job name

#SBATCH -D ./ # working directory = this SIM-i folder

#SBATCH -o out.%j # STDOUT → out.<jobid>

#SBATCH -e err.%j # STDERR → err.<jobid>

#SBATCH --gres=gpu:rtx6000:3 # reserve 3 RTX6000 GPUs

#SBATCH --mem=186000 # 186 GB RAM

#SBATCH --nodes=6 # number of nodes

#SBATCH --ntasks-per-node=6 # MPI tasks per node

#SBATCH --cpus-per-task=9 # OpenMP threads per MPI rank

#SBATCH --time=24:00:00 # walltime

# ────────────────────────────────────────────────────────────────────────────

# 1) load modules

# ────────────────────────────────────────────────────────────────────────────

module purge

module load intel

module load cuda

module load gromacs/2022.4

# ────────────────────────────────────────────────────────────────────────────

# 2) set MPI/OpenMP env vars

# ────────────────────────────────────────────────────────────────────────────

export OMP_NUM_THREADS=\$SLURM_CPUS_PER_TASK

export MPI_NUM_RANKS=\$SLURM_NTASKS_PER_NODE

# export OMP_PLACES=cores # uncomment if you disable hyperthreading

# ────────────────────────────────────────────────────────────────────────────

# 3) pre‐process & run

# ────────────────────────────────────────────────────────────────────────────

gmx grompp \

-f ${MDP_FILE} \

-p ${TOPOL_FILE} \

-c ${gro_input} \

-t ${cpt_input} \

-o prod

srun gmx_mpi mdrun \

-v -deffnm prod \

-bonded gpu -nb gpu \

-ntomp \$OMP_NUM_THREADS \

-maxh 24

EOF

# make submit.sh executable

chmod +x "${sim_dir}/submit.sh"

# if not the last segment, append the dependency‐submit line

if (( i < NUM_SIM )); then

cat >> "${sim_dir}/submit.sh" << EOF

# ────────────────────────────────────────────────────────────────────────────

# 4) submit next segment when this one finishes OK

# ────────────────────────────────────────────────────────────────────────────

cd ../SIM-${next}/

sbatch --dependency=afterok:\$SLURM_JOB_ID submit.sh

EOF

fi

done

echo "Created SIM-1…SIM-${NUM_SIM} directories with submit.sh scripts."

echo "To kick off the chain, run:"

echo "sbatch SIM-1/submit.sh"

[VMD] Alignment of molecules

The following script aligns two protein molecules in VMD, based on CA atoms. Change according to your system. After opening vmd, source the following script on the TK console.

mol new mol1.pdb type pdb waitfor all

mol new mol2.pdb type pdb waitfor all

set s0 [atomselect 0 "name CA"] # Adapt the selection according to requirement

set s1 [atomselect 1 "name CA"] # Adapt the selection according to requirement

set M [measure fit $s0 $s1] # This sequence is important

set all0 [atomselect 0 all]

$all0 move $M

[VMD] Change default options in vmd

Download the .vmdrc file in your user home: download

Change the various parameters in this file as you wish. Be careful to maintain the format. Enjoy!

N.B. The name of the file should be .vmdrc (mind the '.')!

[VMD] Render frames as images after loading molecule in VMD

The following TCL script does the job. Save the code as render.tcl and run the following command on the TK Console:

source render.tcl

Code:

set output_dir "frames"

file mkdir $output_dir

render TachyonInternal on

display depthcue off

display antialias on

axes location off

set render_width 1920

set render_height 1080

set frame_interval 10

set molID [molinfo top]

set num_frames [molinfo $molID get numframes]

for {set i 0} {$i < $num_frames} {incr i $frame_interval} {

animate goto $i

set img_file [format "%s/frame_%04d.tga" $output_dir $i]

render TachyonInternal $img_file

puts "Rendered frame $i to $img_file"

}

Use ffmpeg to make a video from the generate images. For details see Create a video from multiple images in Linux above.

[VMD] Select and display whole water molecules within cutoff in VMD

In VMD, for displaying water molecules within a given distance (let's say 5 Angstrom) from protein, we use:

water within 5 of protein

However, this selects atoms of water, whereby you might see only H or O atoms of some waters. To display the whole molecules, use:

same fragment as water within 5 of protein

[XMGRACE] Change default options in xmgrace

(1) Create the following directory in user home.

mkdir .grace/

(2) Inside .grace/ create a file gracerc.user with the following contents.

HARDCOPY DEVICE "JPEG"

DEVICE "JPEG" DPI 300

Here you can change the options for the default output settings for figures.

(3) Here, also create folder templates. Inside templates download the Default.agr file: download

You may change the various parameters in the Default.agr file that you downloaded as per your choices, and you are ready.

Enjoy plotting!