CURVEFIT
The program curvefit is a general purpose Levenberg-Marquardt
non-linear least squares fitting program. It can be used in an
interactive or batch mode, although in most cases, a PC or Mac graphing program will be more convenient for interactive use. The list of functions that can
be fit is easily expandable, provided that you have access to a FORTRAN
compiler. Output from the program is intended to be
graphed using the
XMGR
program. Output is directed to standard output and
can be redirected to a file or piped directly into XMGR.
Usage
The program is executed with the following command:
curvefit [-help -grid -jack -noerror -debug -nointercept -xmgr -r seed -x value -m filename -f filename]
in which the bracketed entries are options described below.
The output can be piped to XMGR using
the command string:
curvefit [options] -xmgr -f filename | xmgr -type xydy -source stdin
The output can be simultaneously directed to a file and piped to XMGR using
the command string:
curvefit [options] -xmgr -f filename | tee file.output | xmgr -type xydy -source stdin
- -help
- Using this option alone will provide a brief synopsis of the program.
- -grid
- By default, curvefit assumes that the user will provide initial guesses
for the parameter values to be optimized for all functions except 'linear'.
If the -grid option is specified,
then the the initial values will be chosen by a grid search. The user must
provide lower bound, upper bound and number of grid steps to use for each
parameter.
- -jack
- By default, curvefit provides estimates of the uncertainties in the
fitted parameters both from the covariance matrix and from a Monte Carlo
simulation. If -jack is specified, then a jackknife simulation is performed
instead of the Monte Carlo simulation.
- -debug
- Echo the input back to the terminal for debugging purposes.
- -nointercept
- Used only with the function type 'linear', this option will force the intercept to be
fixed at zero, only the slope will be fit to the data.
- -noerror
- By default, curvefit assumes that data will be input as (x, y, dy) triples
(where dy is the uncertainty in y). If -noerror is specified, then only (x,y)
pairs will be read. -noerror implies -jack because the Monte Carlo simulations
cannot be done without uncertainties being provided.
- -xmgr
- Produce output for plotting using the
XMGR
software package.
- -raw
- The data will be read from a plain file without any header information. The user will be prompted for other information.
- -r seed
- Specifies an initial seed for the random number generator. If USE_GETSEED
is defined in the Makefile for the program, then entering a seed is not necessary because the program will use the system clock to generate the seed. The routine getseed.f may need to be re-written for use on systems other than Silicon Graphics in order to use the USE_GETSEED compiler option.
- -x value
- Specifies a value of the independent variable that will be used to obtain a predicted value of the dependent variable with an uncertainty estimated by Monte Carlo simulations
- -m filename
- Specifies a filename to be used for writing the values of the fitted paramters determined for each Monte Carlo step in the Monte Carlo simulation of uncertainties. This file is useful for performing error analysis during subsequent calculations using fitted parameters.
- -f filename
- Specifies a filename to be read for input, rather than interactive data
entry from the terminal. The format of the input file is given below.
Input File Format
The input file format used with the -f flag is given below. In general, the
input consists of a keyword (title, function, xmgr, or data) followed by
input values (that may extend over more than one line as described below).
Strings that contain spaces must be enclosed in double quotes. The program is case sensitive.
Lines starting with "#" are treated as comments.
Input is free format and blank lines, tabs, and blanks are ignored.
title string
function string
string real real integer
.
.
.
string real real integer
xmgr
string
.
.
.
string
data
real real real
.
.
.
real real real
The input associated with each keyword is described below:
- title string
- A title for the data of up to 80 characters
- function string
- String is one of the defined function names. The functions are listed
using the command curvefit -help. The function keyword is followd by M
input lines, one for each parameter in the function. In the default mode, each
line consists of a string defining the name of the parameter and a real
number defining the initial guess for the parameter. If -grid
is specified, the
parameter string is followed by three entries: a real number defining the lower
bound for the parameter, a real number defining the upper bound for the
parameter and an integer defining the number of search steps between the lower
and upper bounds.
- xmgr
- This keyword is followed by options used for XMGR.
Each of these
inputs consists of the keyword @ followed by a string setting some XMGR option. See the sample input files provided or the XMGR help pages for more information. If
the -xmgr option is not specified, these entries are ignored.
- data
- This keyword is followed by N lines of data, one line for each data point. The
data is input as (x, y, dy) triples, unless the -noerror option
is specified.
In this case, the data is input as (x,y) pairs. If no uncertainties are
specified for the y values, then the data may need to be scaled to obtain
good convergence of the program.
A sample input file is given below:
title "sample tanh fit"
function hyperbolic_tangent
Rate 0.0 20.0 100
xmgr
@ XAXIS LABEL "time (sec)"
@ YAXIS LABEL "intensity (arb. units)"
@ XAXIS TICKLABEL FORMAT DECIMAL
@ YAXIS TICKLABEL FORMAT DECIMAL
@ XAXIS TICKLABEL CHAR SIZE 0.8
@ YAXIS TICKLABEL CHAR SIZE 0.8
data
0.0053 0.024 0.01
0.15 0.22 0.02
0.30 0.36 0.02
0.45 0.47 0.03
If the -raw flag is used, the input file format consists only of data in
(x,y,dy) or (x,y) pairs:
real real real
.
.
.
real real real
A sample input file is given below:
0.0053 0.024 0.01
0.15 0.22 0.02
0.30 0.36 0.02
0.45 0.47 0.03
The output from the program is given below:
# Least-Squares Curve Fitting Results
#
# title sample tanh fit
# function hyperbolic_tangent
# equation y=tanh(-Rate*x)
# points 4
# X2 7.7927
# X2(red) 2.5976
#
# Parameter Fitted_Value Fitted_Error Sim_value Sim_error
# Rate 1.2494 0.0537 1.2496 0.0532
#
# %tile X2
# 0.0500 0.2773
# 0.1000 0.4592
# 0.1500 0.6899
# 0.2000 0.9465
# 0.2500 1.0985
# 0.3000 1.2824
# 0.3500 1.4478
# 0.4000 1.7044
# 0.4500 1.8563
# 0.5000 2.0298
# 0.5500 2.2738
# 0.6000 2.6877
# 0.6500 3.1017
# 0.7000 3.5734
# 0.7500 3.8593
# 0.8000 4.6502
# 0.8500 5.1785
# 0.9000 6.1981
# 0.9500 7.1826
#
Additional output is appended to the above if the -xmgr option
is specified. The output is mostly self-explanatory. The reduced chi-square
variable [X2(red)] is given by X2/(N-M), in which X2 is the chi-square value
for the best-fit model, N is the number of data points, and M is the number of
parameters in the function. If Monte Carlo simulations are performed, the
distribution of X2 is estimated and the percentiles of the distribution (from
5% to 95%) are reported.
The XMGR output produced using the above input file appears below.
Batch processing
A simple script 'batch_curve' is provided with the distribution that will
process all the data files in the current working directory
with a designated extension as part of their
file names. Output files will be written to disk and the fitted data will be displayed using XMGR. After viewing a given data set, exit XMGR to enable the script to proceed to the next data set.
For example, the command
batch_curve DATA
will fit all files with names of the form string.DATA and produce output files string.DATA.out.
Reference
The linear least squares algorithm, non-linear Levenberg-Marquardt fitting algorithm and Monte Carlo simulation procedure
are adapted from the text Numerical Recipes (Press, et al, 1992). This
source should be consulted for additional information concerning the algorithms.
The jackknife simulation procedure is described in Data Analysis and
Regression. A Second Course in Statistics (Mosteller and Tukey, 1977).
The Function Library
The library of defined functions is maintained in the file funclib.f.
This file contains three FORTRAN subprograms that must be modified to add
additional functions the the library.
The proper locations for modifying the funclib.f file are
indicated in the file itself.
initf(nfuncs,fnames,feqs,nparms,parnam)
This subroutine provides basic information concerning the defined functions.
To add a function, perform the following steps:
- Increase the variable nfuncs by one. This is the total number of
defined functions.
- Set fnames(i) = 'string', where string is the function name
without embedded spaces or tabs and i is the index of the function in the
library.
- Set feqs(i) = 'string', where string is the functional form y=f(x) without
embedded spaces and i is the index of the function in the library.
- Set nparms(i) = M, in which M is equal to the number of parameters in the function.
- For k=1 to M, set parnam(k,i) = 'string' in which string is the name of the parameter, without embedded spaces, and i is the index of the function in the
library.
func(fname,x,a,M)
This subroutine returns the value of the function fname
at a point x. New functions
are added by adding an 'elseif' block in the subroutine:
elseif (fname.eq.'string') then
func='expression'
in which string is the function name and expression
is the FORTRAN representation
of the function. The parameters of the function are a(1) to a(M).
fgrad(fname,x,a,dyda,M)
This routine returns an array dyda containing the derivatives of the
function fname
with respect to the parameters a(1) to a(M) at the point x. New
functions are added by adding an 'elseif' block in the subroutine:
elseif (fname.eq.'string') then
dyda(1)='expression'
.
.
.
dyda(M)='expression'
in which string is the function name and expression
is the FORTRAN representation
of the derivatives of the function with respect to a(1) to a(M).
Compilation
The program is written in FORTRAN 77 and is
compiled simply by typing make on the command line. If
the software is being compiled for a workstation other than an SGI, then the
routine getseed.f must be modified to read the system clock (to obtain
a random number seed). As discussed below, the BLAS library must be installed
on the workstation.
License
Copyright (C) 1998 Arthur G. Palmer
Arthur G. Palmer
Department of Biochemistry and Molecular Biophysics
Columbia University
630 West 168th Street
New York, NY 10032
email:
agp6@columbia.edu
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the
Free Software Foundation, Inc.
59 Temple Place - Suite 330
Boston, MA 02111-1307, USA.
This program uses some software routines copyrighted by Numerical Recipes Software.
You should obtain a license from Numerical Recipes if you do not have one already.
You can obtain an academic workstation license by sending your name, address,
email address, workstation hostname, workstation internet address, workstation
brand and model number, and a check for $50.00 to
Numerical Recipes Software
P.O. Box 243
Cambridge, MA 02238
Be certain to state that you want the FORTRAN version of the
software. You will also need the BLAS library installed on your workstation.
This library is normally supplied by the workstation vendor, or can be obtained
from www.netlib.org.
History
- 1.0 Initial version (not public)
- 1.1 Initial public release (12/10/98)
- 1.2 New command line features incorporated (12/28/98)
- 1.21 Gradient for inversion recovery and exponential+offset functions corrected
(9/20/99)
- 1.22 Changed convergence criteria to avoid getting trapped in a local minimum
at the first iteration (12/19/99)
- 1.23 Added functions for fitting model-free formalism to reduced spectral
density data (8/3/01)