CURVEFIT

The program curvefit is a general purpose Levenberg-Marquardt non-linear least squares fitting program. It can be used in an interactive or batch mode, although in most cases, a PC or Mac graphing program will be more convenient for interactive use. The list of functions that can be fit is easily expandable, provided that you have access to a FORTRAN compiler. Output from the program is intended to be graphed using the XMGR program. Output is directed to standard output and can be redirected to a file or piped directly into XMGR.

Usage

The program is executed with the following command:

curvefit [-help -grid -jack -noerror -debug -nointercept -xmgr -r seed -x value -m filename -f filename]

in which the bracketed entries are options described below. The output can be piped to XMGR using the command string:

curvefit [options] -xmgr -f filename | xmgr -type xydy -source stdin

The output can be simultaneously directed to a file and piped to XMGR using the command string:

curvefit [options] -xmgr -f filename | tee file.output | xmgr -type xydy -source stdin

-help
Using this option alone will provide a brief synopsis of the program.
-grid
By default, curvefit assumes that the user will provide initial guesses for the parameter values to be optimized for all functions except 'linear'. If the -grid option is specified, then the the initial values will be chosen by a grid search. The user must provide lower bound, upper bound and number of grid steps to use for each parameter.
-jack
By default, curvefit provides estimates of the uncertainties in the fitted parameters both from the covariance matrix and from a Monte Carlo simulation. If -jack is specified, then a jackknife simulation is performed instead of the Monte Carlo simulation.
-debug
Echo the input back to the terminal for debugging purposes.
-nointercept
Used only with the function type 'linear', this option will force the intercept to be fixed at zero, only the slope will be fit to the data.
-noerror
By default, curvefit assumes that data will be input as (x, y, dy) triples (where dy is the uncertainty in y). If -noerror is specified, then only (x,y) pairs will be read. -noerror implies -jack because the Monte Carlo simulations cannot be done without uncertainties being provided.
-xmgr
Produce output for plotting using the XMGR software package.
-raw
The data will be read from a plain file without any header information. The user will be prompted for other information.
-r seed
Specifies an initial seed for the random number generator. If USE_GETSEED is defined in the Makefile for the program, then entering a seed is not necessary because the program will use the system clock to generate the seed. The routine getseed.f may need to be re-written for use on systems other than Silicon Graphics in order to use the USE_GETSEED compiler option.
-x value
Specifies a value of the independent variable that will be used to obtain a predicted value of the dependent variable with an uncertainty estimated by Monte Carlo simulations
-m filename
Specifies a filename to be used for writing the values of the fitted paramters determined for each Monte Carlo step in the Monte Carlo simulation of uncertainties. This file is useful for performing error analysis during subsequent calculations using fitted parameters.
-f filename
Specifies a filename to be read for input, rather than interactive data entry from the terminal. The format of the input file is given below.

Input File Format

The input file format used with the -f flag is given below. In general, the input consists of a keyword (title, function, xmgr, or data) followed by input values (that may extend over more than one line as described below). Strings that contain spaces must be enclosed in double quotes. The program is case sensitive. Lines starting with "#" are treated as comments. Input is free format and blank lines, tabs, and blanks are ignored.
title        string
function     string
   string   real     real     integer
                     .
                     .
                     .
   string   real     real     integer
xmgr
   string
      .
      .
      .
   string
data
   real     real     real
              .
              .
              .
   real     real     real
The input associated with each keyword is described below:
title string
A title for the data of up to 80 characters
function string
String is one of the defined function names. The functions are listed using the command curvefit -help. The function keyword is followd by M input lines, one for each parameter in the function. In the default mode, each line consists of a string defining the name of the parameter and a real number defining the initial guess for the parameter. If -grid is specified, the parameter string is followed by three entries: a real number defining the lower bound for the parameter, a real number defining the upper bound for the parameter and an integer defining the number of search steps between the lower and upper bounds.
xmgr
This keyword is followed by options used for XMGR. Each of these inputs consists of the keyword @ followed by a string setting some XMGR option. See the sample input files provided or the XMGR help pages for more information. If the -xmgr option is not specified, these entries are ignored.
data
This keyword is followed by N lines of data, one line for each data point. The data is input as (x, y, dy) triples, unless the -noerror option is specified. In this case, the data is input as (x,y) pairs. If no uncertainties are specified for the y values, then the data may need to be scaled to obtain good convergence of the program.

A sample input file is given below:

title "sample tanh fit"

function hyperbolic_tangent
   Rate  0.0 20.0 100

xmgr
   @ XAXIS LABEL "time (sec)"
   @ YAXIS LABEL "intensity (arb. units)"
   @ XAXIS TICKLABEL FORMAT DECIMAL
   @ YAXIS TICKLABEL FORMAT DECIMAL
   @ XAXIS TICKLABEL CHAR SIZE 0.8
   @ YAXIS TICKLABEL CHAR SIZE 0.8

data
   0.0053  0.024 0.01
   0.15    0.22  0.02
   0.30    0.36  0.02
   0.45    0.47  0.03
If the -raw flag is used, the input file format consists only of data in (x,y,dy) or (x,y) pairs:
   real     real     real
              .
              .
              .
   real     real     real
A sample input file is given below:
   0.0053  0.024 0.01
   0.15    0.22  0.02
   0.30    0.36  0.02
   0.45    0.47  0.03

The output from the program is given below:

# Least-Squares Curve Fitting Results
#
# title    sample tanh fit                                                                 
# function hyperbolic_tangent  
# equation y=tanh(-Rate*x)                         
# points          4
# X2                7.7927
# X2(red)           2.5976
#
# Parameter  Fitted_Value  Fitted_Error  Sim_value  Sim_error
# Rate         1.2494       0.0537        1.2496     0.0532
#
# %tile     X2
#          0.0500         0.2773
#          0.1000         0.4592
#          0.1500         0.6899
#          0.2000         0.9465
#          0.2500         1.0985
#          0.3000         1.2824
#          0.3500         1.4478
#          0.4000         1.7044
#          0.4500         1.8563
#          0.5000         2.0298
#          0.5500         2.2738
#          0.6000         2.6877
#          0.6500         3.1017
#          0.7000         3.5734
#          0.7500         3.8593
#          0.8000         4.6502
#          0.8500         5.1785
#          0.9000         6.1981
#          0.9500         7.1826
#

Additional output is appended to the above if the -xmgr option is specified. The output is mostly self-explanatory. The reduced chi-square variable [X2(red)] is given by X2/(N-M), in which X2 is the chi-square value for the best-fit model, N is the number of data points, and M is the number of parameters in the function. If Monte Carlo simulations are performed, the distribution of X2 is estimated and the percentiles of the distribution (from 5% to 95%) are reported. The XMGR output produced using the above input file appears below.

Batch processing

A simple script 'batch_curve' is provided with the distribution that will process all the data files in the current working directory with a designated extension as part of their file names. Output files will be written to disk and the fitted data will be displayed using XMGR. After viewing a given data set, exit XMGR to enable the script to proceed to the next data set. For example, the command
batch_curve DATA

will fit all files with names of the form string.DATA and produce output files string.DATA.out.

Reference

The linear least squares algorithm, non-linear Levenberg-Marquardt fitting algorithm and Monte Carlo simulation procedure are adapted from the text Numerical Recipes (Press, et al, 1992). This source should be consulted for additional information concerning the algorithms. The jackknife simulation procedure is described in Data Analysis and Regression. A Second Course in Statistics (Mosteller and Tukey, 1977).

The Function Library

The library of defined functions is maintained in the file funclib.f. This file contains three FORTRAN subprograms that must be modified to add additional functions the the library. The proper locations for modifying the funclib.f file are indicated in the file itself.

initf(nfuncs,fnames,feqs,nparms,parnam)

This subroutine provides basic information concerning the defined functions. To add a function, perform the following steps:

func(fname,x,a,M)

This subroutine returns the value of the function fname at a point x. New functions are added by adding an 'elseif' block in the subroutine:
      elseif (fname.eq.'string') then
          func='expression'
in which string is the function name and expression is the FORTRAN representation of the function. The parameters of the function are a(1) to a(M).

fgrad(fname,x,a,dyda,M)

This routine returns an array dyda containing the derivatives of the function fname with respect to the parameters a(1) to a(M) at the point x. New functions are added by adding an 'elseif' block in the subroutine:
      elseif (fname.eq.'string') then
          dyda(1)='expression'
                .
                .
                .
          dyda(M)='expression'
in which string is the function name and expression is the FORTRAN representation of the derivatives of the function with respect to a(1) to a(M).

Compilation

The program is written in FORTRAN 77 and is compiled simply by typing make on the command line. If the software is being compiled for a workstation other than an SGI, then the routine getseed.f must be modified to read the system clock (to obtain a random number seed). As discussed below, the BLAS library must be installed on the workstation.

License

Copyright (C) 1998 Arthur G. Palmer

Arthur G. Palmer
Department of Biochemistry and Molecular Biophysics
Columbia University
630 West 168th Street
New York, NY 10032

email: agp6@columbia.edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the

Free Software Foundation, Inc.
59 Temple Place - Suite 330
Boston, MA 02111-1307, USA.

 

This program uses some software routines copyrighted by Numerical Recipes Software. You should obtain a license from Numerical Recipes if you do not have one already. You can obtain an academic workstation license by sending your name, address, email address, workstation hostname, workstation internet address, workstation brand and model number, and a check for $50.00 to

Numerical Recipes Software
P.O. Box 243
Cambridge, MA 02238

 

Be certain to state that you want the FORTRAN version of the software. You will also need the BLAS library installed on your workstation. This library is normally supplied by the workstation vendor, or can be obtained from www.netlib.org.

History

1.0 Initial version (not public)
1.1 Initial public release (12/10/98)
1.2 New command line features incorporated (12/28/98)
1.21 Gradient for inversion recovery and exponential+offset functions corrected (9/20/99)
1.22 Changed convergence criteria to avoid getting trapped in a local minimum at the first iteration (12/19/99)
1.23 Added functions for fitting model-free formalism to reduced spectral density data (8/3/01)