NMR to Modelfree Relaxation Scripts

This document describes the use of a set of scripts to streamline the analysis of spin relaxation data using the programs CurveFit and ModelFree. These two programs are in-house software programs written by Arthur G. Palmer and available from the Palmer laboratory WWW site.

Scripts similar to these were originally created and compiled over several years, starting in 1991 at The Scripps Research Institute. The principal authors are Drs. Martin J. Stone, Johan Kordel, Mikael Akke, and Arthur G. Palmer. The scripts have been re-written to use the new program CurveFit and version 4 of the ModelFree program.

These scripts are free software; you can redistribute them and/or modify them under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version.

These scripts are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

The package should contain the following files:

Documentation:

Felix macros and schema:

Unix shell scripts: Sample input files used by nmr2curvefit:

INPUT FILE FORMAT

The unix shell scripts that end in 'nmr', such as 'snratio_nmr' assume that the crosspeak assignments, positions of the cross peaks, peak heights, and peak volumes have been output to a text file. One output file should exist for each spectrum. The output files should be named according to the convention 'rootnameN', in which rootname is the same for all files and N is a unique integer (normally N will be a consecutive series of integers) . For example,

Filename Rootname
relax1 relax
relax.1 relax.
relax_1 relax_
The text files should contain 1 line per residue containing the necessary information. Other information can be present on the line beyond that needed by these scripts. The numbers of the fields containing the needed information are set within the scripts. The fields are set for output files generated by the provided FELIX multiquant.mac macro. If you use another processing program, you will have to change the field numbers appropriately in the scripts. For example, each line in the output files produced by multiquant.mac have the format

item cen1 wid1 ptr1 asg1 cen2 wid2 ptr2 asg2 cc int vol
in which

cen1 is the F2 coordinate of the crosspeak
cen2 is the F1 coordinate of the crosspeak
asg1 is the F2 assignment of the crosspeak
asg2 is the F1 assignment of the crosspeak
int is the intensity of the crosspeak
vol is the volume of the crosspeak

Thus, the scripts contain the settings

F1POS_FIELD=6
F2POS_FIELD=2
F1NAME_FIELD=9
F2NAME_FIELD=5
HEIGHT_FIELD=11
VOLUME_FIELD=12
NULL_NAME="null"

NULL_NAME is assigned to be the string used if an assignment is not available for a residue (in either of the dimensions). For example, in FELIX, both the F1 and F2 assignments are identical for 1H-15N correlation spectra, so setting both assignments is not necessary.

The F1POS and F2POS fields are only needed for the checkpeaks_nmr script. This script is not essential for analyzing the relaxation data (see below).

QUANTIFYING PEAK INTENSITIES AND VOLUMES IN FELIX

The evaluation of peak intensities are made within Felix using the macro multiquant.mac and the schema xpkhgtv.sch.

Installation

  1. Copy the schema xpkhgtv.sch to the user schema directory or to the main schema directory for the FELIX distribution.
  2. Copy the macro multiquant.mac to the user macro directory or to the main macro directory for the FELIX distribution.

Usage

Create an xpk:peaks entity in your database if it does not already exist and assign the resonances in the spectrum as usual. Back-up this data base to be on the safe side.

Run the macro multiquant.mac. This macro will prompt you for a number of inputs:

The macro will write an output file in the user's FELIX directory. The output files will be named rootnameN. One line will be created for each entry in the xpk:peaks entity and will have the format described above. In general, this file looks much like xpk:peaks, but contains the following additional items: (i) the intensity maximum found within the crosspeak boundaries and (ii) the volume of the crosspeak. In addition, the coordinates for the assigned crosspeak center will be replaced by the coordinates of the intensity maximum. Volumes are the sum of the intensities of the 3 x 3 grid centered in the peak maximum.

The macro creates a temporary entity named 'peakmax'. If the macro crashes for some reason, you probably will have to delete this entity using the FELIX command

dba entity delete peakmax

before reexecuting the macro.

The macro multiquant.mac has two modes set by the answer to the question 'use fixed peak locations (y/n)?'. In the first ('y'), the macro assumes that the assigned coordinates in the xpk:peaks entity represent appropriate peak locations. In the second ('n'), the macro searches for the maximum intensity within the region defined by the crosspeak widths in each dimension (scaled by the parameter 'fudge' defined in the macro). Normally, the second mode is preferable; however, for very weak peaks, multiquant.mac may find the peak maximum at a location that doesn't correspond to that where the stronger peaks have been found in a relaxation series. The script checkpeaks_nmr can be used to check for this problem (more below). A possible solution for the second mode for the spectra with strong intensities, and then the first mode for spectra with weak intensities.

ANALYZING PEAK INTENSITY/VOLUME DATA

After quantifying the spectra using either the multiquant.mac macro in FELIX or your favorite processing program, you will have one intensity file for each spectrum (e.g. rootname1, rootname2, etc.), containing the peak intensity information. The following is a description of the procedure for analyzing these output files using the provided UNIX shell scripts. The shell scripts should be installed in a directory on your PATH environment variable (see your .cshrc file to change this).

Several of the scripts get their input by doing an `ls' and looking for all files present that conform (in some way) to the file name(s) provided on the command line. The draw-back with this is that you may need to think twice about how you name your files (and make sure that you keep rather clean directories), but the great benefit is that you never need to give the number of files, nor give a list of all of the files, as input.

Additional information on particular scripts is generally available in the header of that script. The appropriate command line syntax for the unix scripts will be printed if the script is executed without command line arguments. Remember that the input file field numbers will have to be changed in any script ending in 'nmr' if you processed the data using a program other than FELIX (see above).

Checking peak positions

Run the checkmaxpt script:

checkpeaks_nmr rootname cut_off_w2 cut_off_w1 output_extension [>output_file]

in which rootname is the core filename, cut_off_w2 is the maximum number of points that you allow the locations of the peak maximum to differ in w2 between different spectra in the series, cut_off_w1 is the maximum number of points that you allow the locations of the peak maximum to differ in w1 between different spectra in the series, new_extension is for output files corresponding to each individual residue. One output file will be created for each record in the intensity files. The output filename will be given by the crosspeak name and output_extension (e.g. intensity) The crosspeak name is given by F2NAME_FIELD unless F2NAME_FIELD is equal to NULL_NAME, in which case the crosspeak name is given by F1NAME_FIELD. The script writes to standard output those instances where the peak maximum location for a certain crosspeak differs more than cut_off points from the average or median coordinates.

Determining signal-to-noise figures

Assuming that you have recorded duplicate spectra for some of the relaxation delays or for some of the NOE pairs, these duplicates will be used for the evaluation of uncertainty in peak intensities and volumes in the spectra. Run the snratio_nmr script for each pair of duplicates:
snratio_nmr name1 name2 > output

in which name1 and name2 are duplicates. If you do not have duplicates, you will have to estimate uncertainties from baseplane noise levels (applicable for R1, R2, or NOE measurements) or by doing jackknife simulations (applicable for R1 and R2 measurements).

Analyzing R1 and R2 data

Create a master file in the directory containing the intensity files (use your favorite text editor, e.g. `vi'), that contains a list of the identifiers for the intensity files (e.g. for file rootname1, the identifier is 1), the corresponding relaxation delay in seconds and the uncertainty in peak height or volume as obtained from snratio_nmr. Interpolate points in order to obtain estimates of the uncertainties for the relaxation time points where no duplicate were taken. A sample master file is provided with this distribution.

Copy the provided sample header file into the directory containing the intensity files and edit it as necessary. This file contains the commands to drive the CurveFit and xmgr programs. See the manual pages for these two programs for assistance in modifying the header file.

Run the nmr2curvefit script:

nmr2curvefit rootname masterfile headerfile output_extension height/volume

in which rootname is the core filename. masterfile and headerfile are the names of the master and header files created in the previous step. output_extension is the extension of the output file names. One output file will be created for each record in the intensity files. The output filename will be given by the crosspeak name (e.g. s74) and output_extension (e.g. t1) The crosspeak name is given by F2NAME_FIELD unless F2NAME_FIELD is equal to NULL_NAME, in which case the crosspeak name is given by F1NAME_FIELD. nmr2curvefit will take care of ambiguous crosspeak names; currently we have indicated ambiguity by incorporating a "/" or "?" in the crosspeak name in the (e.g. a12/g54, t24?, s45/, v34/k73?, e33/y99/w90, or any such combination). You can easily modify this to fit your own taste.

Run the curvefit_all script:

curvefit_all [-grid -jack -xmgr -noerror -print -display ] input_extens ion output_extension

in which input_extension is the same as output_extension from the previous step andoutput_extension is the extension of the output file names for the CurveFit results. The optional parameters -grid, -jack, -xmgr -noerror are commands passed through to CurveFit. -print determines whether the xmgr plot should be printed to an output device (you should set the PRINTSTRING variable in the script appropriately). -display determines whether the xmgr plot should be displayed on the terminal.

Run the curvefit2table script:

curvefit2table extension [X2_cutoff] [> output_file]

in which extension is the output_extension from the previous step and X2_cutoff is the 1-alpha level for testing the quality of the fit (e.q. 0.95 for a 95% confidence level). The output contains the fitted rates for each crosspeak. A series of '*' are appended to each entry if the measured chi-square variable exceeds the critical value.

Analyzing NOE data

Run the noecalc_nmr script:

noecalc_nmr noe_spectrum [height_error volume_error] no_noe_spectrum [height_error volume_error] > output_file

in which noe_spectrum and no_noe_spectrum are intensity files produced using FELIX or other processing program as described above. Output_file is contains the crosspeak name and the noe, i.e. the ratio of the peak intensity in noe_spectrum and no_noe_spectrum [i.e. intensity(noe)/intensity(no_noe)] as calculated from both the peak heights and volumes. If uncertainty estimates are provided for the input spectra, then uncertainties will be calculated for the NOE as well.

If multiple NOE data sets have been acquired, then average values can be calculated as either conventional mean values or as weighted mean values using the scripts:

noe_average noe_file_1 noe_file_2 ... etc [> output_file]

noe_weighted_average noe_file_1 noe_file_2 ... etc [> output_file]

in which noe_file_N are the output from the noecalc_nmr script. Weighted averages only can be calculated if uncertainties were obtained in the output from the noecalc_nmr script.

Producing a DATA input file for ModelFree

Now you have all the relaxation rate constants in separate files for R1, R2, and NOE measurements. To produce a table containing these three paramters for each crosspeak run the script:

make_ratestable r1table r2table noetable [height/volume] [>output_file]

in which r1table and r2table were produced using the curvefit2table script and noetable were produced from noecalc_nmr, noe_average, or noe_weighted_average scripts. height/volume parameter determines whether NOE values based on peak heights or peak volumes will be used.

Do the above for each static magnetic field for which you have data. You will then have one table of R1, R2, NOE data for each static field. To create the ModelFree data file, run the script:

"make_mfdata field_1 rates_table_1 .... field_n rates_table_n

in which field_N is the 1H Larmor frequency in MHz for the Nth field and rates_table_N is the output from the make_ratestable script for the Nth field.

For further notes and help, please first take a look in the header of the script in question, or look at the man pages of the ModelFree and CurveFit programs.