by Cyrille Rossant
The 3.x branch of Python is not backward compatible with the 2.x branch, which explains why the 2.7 version is still maintained. Even if most external Python packages used in this book are compatible with Python 3.x, some packages are still not compatible with this branch. At this time, the choice between Python 2.x and Python 3.x for a new project is typically dictated by the Python 3 support of the required external Python packages. The setups of the targeted users is also an important point to consider. In this book, we will use Python 2.7 and try to minimize the incompatibilities with Python 3.x. This issue is beyond the scope of this book, and we encourage you to search for information about how to write code for Python 2.x that is as compatible with Python 3.x as possible. This official web page is a good starting point: http://wiki.python.org/moin/Python2orPython3
The standard Unix commands pwd, ls, cd, cp, ps, ... are available in IPython.
Magic commands actually come with a % prefix, but the automagic system, enabled by default, allows you to conveniently omit this prefix. Using the prefix is always possible, particularly when the unprefixed command is shadowed by a Python variable with the same name. The %automagic command toggles the automagic system.
Appending ? or ?? to any command or variable gives you all the information you need about it: var? è analogo a %pinfo var.
In any interactive session, your input and output history is kept in the In and Out variables and is indexed by a prompt number. The _, __, _ and _i, _ii, _iii variables contain the last three output and input objects, respectively. The _n and _in variables return the nth output and input history.
Whenever you start typing any command, variable name, or function, press the Tab key to let IPython either automatically complete what you are typing if there is no ambiguity, or show you the list of possible commands or names that match what you have typed so far. It also works for directories and file paths, just like in the system shell.
To force IPython to show all private attributes and methods, type myobject._ before pressing the Tab key.
A Python script can be executed from within the IPython console with the %run magic command followed by the script filename. The script is executed in a fresh, new Python namespace unless the -i option has been used, in which case the current interactive Python namespace is used for the execution: %run script.py.
You can do quick benchmarks in an interactive session with the %timeit magic command. It lets you estimate how much time the execution of a single command takes.
%timeit [x*x for x in range(100000)]
Whenever an exception is raised in the console, use the %debug magic command to launch the debugger at the exception point. You can use the %pdb magic command to activate the automatic execution of the IPython debugger as soon as an exception is raised.
The %pylab magic command enables the scientific computing capabilities of the NumPy and matplotlib packages, namely efficient operations on vectors and matrices and plotting and interactive visualization features.
Python provides two ways to import modules
import math
print(math.pi)
As an alternative, you can import an object from a module and avoid to specify the library name. Example:
from math import pi
print(pi)
Or you can use the star operator to import everything from the module. The advantage of importing everything from the math module is that your code can be more concise. The disadvantage is that there might be conflicts between names defined in different modules, or between a name from a module and one of your variables. Example:
from math import *
cos(pi)
IPython is not only an extended Python console, but it also provides several ways to interact with the operating system during a Python interactive session without quitting the console. The list of aliases of UNIX commads can be obtained with the magic %alias command.
The native Python module urllib2 can be used to download a file, and the zipfile module to extract it.
import urllib2, zipfile
url = 'http://camcat.df.unicam.it/camcat_files/'
filename = 'TemplateLaTex.zip'
downloaded = urllib2.urlopen(url + filename)
A new folder in the current directory can be made by using mkdir, which is a particular IPython alias redirecting a magic command to a shell command, and the dollar ($) sign to use a Python variable within a system or magic command.
folder = 'data'
mkdir $folder
%cd $folder
with open(filename, 'wb') as f:
f.write(downloaded.read())
and extract it in the current folder with the extractall method of zip and a ZipFile object:
with zipfile.ZipFile(filename) as zip:
zip.extractall('.')
The current directory can be saved as a bookmark using the command %bookmark my_data. Now, in any future session with the same IPython profile, we can type cd my_data to enter into this directory, whichever directory we call this command from. The -l and -d options allow to respectively list all defined bookmarks, and delete a specified bookmark. Typing %bookmark? displays the list of all options.
We can also launch commands using the system shell directly from IPython, and retrieve the result as a list of strings in a Python variable. To do this, we need to prefix shell commands with !. For example:
files = !ls -1 -S | grep Info
The Unix command ls -1 -S lists all files in the current directory, sorted by decreasing size, and with one file per line. The pipe | grep pippo filters only those files that contain pippo. Then, the Python variable files contains the list of all filenames.
We can also use Python variables in the system command, using either the $ syntax for single variables, or {} for any Python expression, as follows:
!head -n5 {files[0]}
The head -n5 {files[0]} command displays the first five lines of the first file in the files list.
The %history magic command (and %hist, which is an alias) accepts multiple convenient options to display the part of the input history we are interested in. By default, %history displays all our input history in the current session. We can specify a specific line range with a simple syntax, for example, hist 4-6 8 for lines four to six and line eight. We can also choose to display our history from the previous sessions with the syntax hist 243/4-8 for lines four to eight in session 243. Finally, we can number the sessions relative to the current session using the syntax %hist ~1/7, which shows line seven of the previous session. Other useful options for %history include -o, which displays the output in addition to the input; -n, which displays the line numbers; -f, which saves the history to a file; and -ps, which displays the classic >>> prompt. For example:
%hist -nop 1-2
displays the history of the first two lines with the line number, the output, and the default Python prompt.
Finally, a related command is %store, which is used to save the content of any Python variable for later use in any future interactive session. The %store name command saves the variable name, and %store -d name deletes it. To recover the stored variables, we need to use %store -r.
When using the IPython console, the %paste magic command can be used to import and execute the code contained in the clipboard. IPython automatically dedents the code and removes the > and + characters at the beginning of the lines, allowing to paste the diff and doctest files directly from e-mails. In addition, the %run magic command executes a Python script in the console, by default, in an empty namespace. It means that any variable defined in the interactive namespace is not available within the executed script. However, at the end of the execution, the control returns to IPython's prompt, and the variables defined in the script are then imported in the interactive namespace. If we want the script to use a variable defined in the interactive namespace, we need to use the -i option.
By default, %edit opens the system's text editor and executes the code when we close the editor. If we supply an argument to %edit, this command will try to open the text editor with the code we supplied.
IPython can also display information about the internals of a variable, in particular about the source code when it is defined in a file. First, typing ? before or after a variable name prints useful information about it. Typing ?? gives more detailed information, in particular, the source code of the object, if it is a function defined in a file. In addition, several magic commands display specific information about a variable, such as the source code of the function (%psource) or of the file (%pfile) where it is defined, the docstring (%pdoc), or the definition header (%pdef). The %pfile magic command also accepts a Python filename, in which case, it prints the file's contents with syntax highlighting. With this function, IPython can then act as a code viewer with syntax highlighting.
First, when we encounter an exception, we can use the %debug magic command to launch the IPython debugger at the exact point where the exception was raised. If we activate the %pdb magic command, the debugger will be automatically launched upon the very next exception. We can also start IPython with ipython --pdb for the same behavior. Finally, we can run a whole script under the control of the debugger with the %run -d command. We can also specify explicitly where to put the first breakpoint; typing %run -d -b29 script.py pauses the program execution on line 29 of script.py. We first need to type c to start the script execution.
When the debugger launches, the prompt becomes ipdb>. The program execution is then paused at a given point in the code. We can use the w command to display the line and the location in the stack traceback where the debugger has paused. At this point, we have access to all local variables and we can control precisely how we want to resume the execution. Within the debugger, several commands are available to navigate into the traceback:
The entire list of commands can be found in the documentation of the pdb module in Python.
First, the %timeit magic function uses the Python's timeit module to estimate the execution time of any Python statement. If we have defined a function fun(x), %timeit fun(x) executes this command multiple times and returns an average of the execution time. The number of calls is determined automatically; there are r loops of n executions each. These numbers can be specified with the -r and -n options to %timeit. Also, we can easily estimate the execution time of a script with the %run -t command.
To obtain much more detailed information about the execution time of a program, we can execute it under the control of a profiler, like the one provided natively by the profile Python module. Profiling is a complex topic, and more details about the profile module can be found in the official Python documentation. To run a script under the control of the profiler, we can execute it from IPython with %run -p or with the equivalent %prun magic command.
The IPython notebook is increasingly used in the Python community, in particular for scientific research and education. It brings both a powerful HTML user interface to IPython and a way of saving a whole interactive session in a notebook file in a JSON format. The latter functionality brings reproducibility to interactive computing, a crucial feature notably in scientific research. Notebooks run in the browser and can contain, not only Python code, but also text in a markup language, such as Markdown, as well as images, videos, or rich content media. Notebooks can be converted into other formats, such as Python scripts, HTML, or PDF.
Typing ipython notebook in a shell it will launch a local web server on the 8888 port (by default), and the link http://127.0.0.1:8888/ appears in a browser. This page is the notebook dashboard; it lists all notebooks in the directory where we launched ipython notebook from. An IPython notebook file has a .ipynb extension; it is a text file containing structured data in JSON.
At the top of the user interface of the notebook, the menu and the toolbar offer access to all commands. The main area below them shows, by default, an empty input cell. Python code can be typed into this input cell. An important feature of an input cell is that pressing the Enter key does not execute the cell, but rather inserts a new line. Writing code into a cell is then closer to what a standard text editor offers, compared to the classic IPython console.
An input cell can be executed in two ways. By pressing Shift + Enter, all the code within the cell is executed in the current IPython interactive namespace. The output then appears in an output area right below the input cell, and a new input cell is created below. By pressing Ctrl + Enter, no new input cell is created and only the output is shown.
A very useful feature of the notebook is the possibility to insert rich text in cells using a popular marker text format called Markdown (described at http://daringfireball.net/projects/markdown/syntax). Edition features such as bold, italic, headers, and bullet points can be inserted with a simple syntax. To do this, we need to convert a cell into a Markdown cell with the Cell > Markdown command.
The plotting capabilities of the notebook are inizialized, first, by launching the notebook with the command ipython notebook --pylab inline. It allows to insert figures within the notebook, thanks to the Matplotlib library. NetworkX offers several Matplotlib-based commands to plot graphs.
%pylab
x = linspace(-10., 10., 1000)
%pylab inline
sg = plot(x, sin(x))
Although IPython's powerful shell and extended console can be advantageously used by any Python programmer, this package was originally a tool designed for scientists by scientists. It was indeed meant to provide a convenient way of doing interactive scientific computing with Python.
IPython does not really provide scientific computing features per se, but rather offers an interactive interface to powerful external libraries, such as NumPy, SciPy, Pandas, Matplotlib, and the like. Together, these tools provide a framework for scientific computing, which can compete with widespread commercial tools in the scientific community such as Matlab or Mathematica.
IPython provides the adequate interactive framework for using all these tools in a streamlined way.
NumPy provides multidimensional arrays. Loops should be avoided every time it is possible to use a NumPy operation instead. An array is a block of data organized into several dimensions. A one-dimensional array is a vector, which is an ordered sequence of elements (typically numbers) that are indexed with a single integer. A two-dimensional array is a matrix, containing elements indexed by a pair of integers, that is, the row index and the column index. More generally, an n-dimensional array is a set of elements with the same data type that are indexed by a tuple of n integers.
All elements in an array must have the same type: it is called the data type (dtype). There are multiple possible types in NumPy: Booleans, signed/unsigned integers, single-precision/double-precision floating point numbers, complex numbers, strings, and so on. Custom data types can also be defined.
The advantage of arrays compared with native Python types is that it is possible to perform very efficient computations on arrays instead of relying on Python loops. The difference is that the loop is implemented internally in C by NumPy instead of Python, so that there is no longer the cost of interpretation within the loop. Indeed, Python being an interpreted, dynamically-typed language, each iteration involves various low-level operations performed by Python (type checking and so on). If NumPy is compiled with the adequate options, array computations can benefit from vectorized CPU instructions [SSE, AVX, XOP, and so on, that use large registers (128 bits or 256 bits)] and can be more than two or four times faster.
To import NumPy in IPython, we can use the %pylab magic command (or start IPython with ipython --pylab), which loads NumPy and Matplotlib within the interactive namespace (available as np for numpy and plt for matplotlib.pyplot). The other possibility is to import NumPy with import numpy (or import numpy as np for the lazy ones) or from numpy import *. The former syntax is to be preferred in a script, while the latter can be used in an interactive session.
Important attributes of an array (es. array_name) include:
In Python/NumPy the brackets [] allow to access elements from a Python-container object. Inside the brackets, the notation :,0 refers to all pairs of indices with any first element (the colon :) and a second element equal to zero. Since, in NumPy, the first dimension always refers to the row and the second dimension to the column, we are precisely referring to the first column here.
Example. First we need to load the libraries %pylab
We generate some random data:
positions = rand(10000000,2)
We now look at some properties of the array:
type(positions)
positions.ndim, positions.shape
We assign to x and y the first and second column, respectively:
x, y = positions[:,0], positions[:,1]
We compute the vector of distances from the position of interest (0.5, 0.5) to all positions:
distances = (x - .5) ** 2 + (y - .5) ** 2
and determine the closest element
%timeit -n10 -r3 exec(In[19])
%timeit -n10 -r3 ibest = distances.argmin()
There are several ways of creating an array.
x = array([[1, 2, 3],[4, 5, 6]], dtype=float64)
x.shape
x.dtype
zeros((2,4))
arange(2, 10, 2)
%precision 2
linspace(-1.,1,10)
Related functions include shuffle and permutation, which randomly permute existing arrays.
%precision 4
rand(2,5)
NumPy provides efficient and convenient ways of loading multidimensional arrays from text (Python strings or text/CSV files) or from binary buffers or files. In addition, the Pandas package is particularly useful when loading tabular data, that is, tables that contain heterogeneous data types instead of just numbers.
np.fromstring('1 2 5 10', dtype=int, sep=' ')
Other fucntions allow to load data from text files or binary files and convert them into NumPy arrays:
The function fromfile() is highly efficient with binary data. loadtxt() is a simplified version of genfromtxt(), useful when the file format is straightforward
cd /Users/gianx
np.loadtxt('Info_iPython_GDG.txt',delimiter=',')
Saving arrays in files is as easy as loading NumPy arrays. There are basically two functions
Once NumPy arrays are created or loaded, there are basically three things that we can do with them.
Selection -- consists of accessing one or several elements within an array. It can be done with NumPy or Pandas.
Example in NumPy. We will select all elements in an array that have a value different from NaN, that is filter an array with a Boolean condition. We can use the NumPy function isnan(), and find the negative values of isnan(), which are contained in ~_, as follows:
x = array([ 1, NaN, 5, 10])
isnan(x)
y = x[~_]
len(y), len(y) / float(len(x))
We can also specify directly the list of indices we want to keep. For instance, if x is a one-dimensional NumPy
array, x[i:j:k] represents a view on x with only those elements having indices between i (included) and
j (excluded) with a step of k. These conventions, with i included and j excluded, are convenient
when working with consecutive portions of an array. For example, the first and second halves of x, assuming a
size 2n, are simply x[:n] and x[n:]. In addition, the length of x[i:j] is simply j - i. In the
end, there should not be +1 or -1 values hanging around in indices in general. If i is omitted, it is assumed
to be zero. If j is omitted, it is assumed to be the length of the array in that dimension. Negative values
mean we count from the end. Finally, the default value for k is one. This notation is also valid in multiple
dimensions; for example, M[i:j,k:l] creates a submatrix view on a 2D array M. Also, we can use x[::-1] to
get x in the reverse order.
IMPORTANT - An important point to consider with array views is that they point to the same location in memory. So a view on a large array does not imply memory allocation, and changing the values of elements in the view also changes the corresponding values in the original array. If this behavior is unwanted, it is possible to force the creation of a new array with y = x.copy() or y = array(x). In the latter case, it is also possible to change the data type of x, with the dtype keyword argument.
Another way of selecting a portion of an array consists in passing an array with explicit integer values for indices. This is called fancy indexing. If x is a one- dimensional vector, and indices is another one- dimensional vector (or a list) with positive integers, then x[indices] returns a vector containing x[indices[0]], x[indices[1]], and so on. Therefore, the length of x[indices] is equal to the length of indices and not the length of x.
x = rand(6)
x.reshape((2, 3))
x.reshape((2, -1))
n = 10
M = arange(1, n + 1).reshape((-1, 1))
M
M = tile(M, (1, n))
M
N = arange(1, n + 1).reshape((1, -1))
N
N = tile(N, (n, 1))
N
M*N
In the previous multiplication table example, we had to repeat identical copies of a row and a column so that we could multiply the two arrays with identical shapes (n, n). Actually, the repeat step is unnecessary, as arrays with different shapes can still be compatible under specific conditions; this is called broadcasting. The general rule is that two dimensions are compatible when they are equal, or when one of them is 1.
M = arange(1, n + 1).reshape((-1, 1))
M
N = arange(1, n + 1).reshape((1, -1))
N
M*N
If A and B are two NumPy arrays with compatible shapes, A + B, A - B, A x B, and A / B are element-wise operations. In particular, when A and B are two-dimensional matrices, A x B is not the matrix product. The matrix product is rather provided by the dot function, which more generally computes the dot product of two arrays.
Common unary operations include
NumPy provides all necessary types and routines for doing efficient numerical computations with Python. SciPy is built on top of NumPy and implements a large variety of higher-level mathematical processing algorithms. Linear algebra routines are provided by the scipy.linalg subpackage: solvers of linear equations, matrix routines, eigenvalue problems, matrix decomposition, and so on.
There are a lot of Python packages for curve plotting, but the most widely used one, by far, is Matplotlib much similar to Matlab. It is one of the most complete and powerful graphical libraries. It can be used both for interactive visualization and for generating high-quality figures that can be readily used in scientific publications. In addition, its high-level interface makes it particularly easy to use. Matplotlib is a particularly rich Python package for generating high-quality figures from NumPy data.
Figures can be displayed interactively in IPython using event loop integration. Then, they can be updated dynamically from the command-line interface. The %pylab magic command (or the --pylab option to the ipython shell command) activates this integration automatically.
When using Matplotlib from a script instead from IPython, we can put the from pylab import ∗ command at the top of the script. In a Python module, it might be a better idea to use import matplotlib.pyplot as plt so that the Matplotlib objects stay within their namespace.
Also, the way of generating plots is slightly different in a script compared to IPython. In a script, the figure is displayed only when the function show() is called, typically, at the very end of the script, whereas, in the IPython command-line interface, the figure is shown and updated at each plot function.
Matplotlib can also be used in the notebook. When launching the notebook with ipython notebook --pylab inline, the plots appear in the output cells as images and are saved as base64 strings within the IPYNB files. Without this inline option, figures are displayed in separate windows as usual. It is also possible to activate this option within the notebook by using the command %pylab inline.
A wide variety of figure examples can be found in the Matplotlib Gallery on the official website (http://matplotlib.org/ gallery.html) and in Nicolas Rougier's tutorial (http://www.loria.fr/~rougier/ teaching/matplotlib/).
y = rand(1000)
plot(y)
x = randn(10000)
y = randn(10000)
plot(x,y,',')
hist(x,bins=20);
x = linspace(-10.,10.,501)
plot(x,sin(x), '-r', label='sine')
plot(x,cos(x),'--g', label='cosine')
xticks([-10.,0,10.])
yticks([-2.,0,2.])
xlabel('x axis')
ylabel('y axis')
legend()
grid()
Multiple independent plots can be displayed on the same figure. We can define a grid with an arbitrary number of rows and columns and plot figures inside each box. Boxes can even span several rows or columns (using subplot2grid).
x = linspace(0, 2 * pi, 1000)
y = 1 + 2 * cos(5 * x)
subplot(1,2,1)
plot(x, y)
subplot(1,2,2, polar=True)
polar(x, y)
from mpl_toolkits.mplot3d import Axes3D
# we create a (X, Y) grid
X = linspace(-5, 5, 50)
Y= X
X, Y = meshgrid(X, Y)
# we compute the Z values
R = sqrt(X**2 + Y**2)
Z = sin(R)
ax = gca(projection='3d')
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=mpl.cm.winter, linewidth=0)
Matplotlib has the option to use LaTeX to manage all text layout. The LaTeX option is activated by setting text.usetex : True in your rc settings. Text handling with matplotlib’s LaTeX support is slower than matplotlib’s very capable mathtext, but is more flexible, since different LaTeX packages (font packages, math packages, etc.) can be used. The results can be striking, especially when you take care to use the same fonts in your figures as in the main document.
There are a couple of options to mention, which can be changed using rc settings. Here is an example matplotlibrc file:
* font.family : serif
* font.serif : Times, Palatino, New Century Schoolbook, Bookman, Computer Modern Roman
* font.sans-serif : Helvetica, Avant Garde, Computer Modern Sans serif
* font.cursive : Zapf Chancery
* font.monospace : Courier, Computer Modern Typewriter
* text.usetex : true
The first valid font in each family is the one that will be loaded. If the fonts are not specified, the Computer Modern fonts are used by default. All of the other fonts are Adobe fonts. Times and Palatino each have their own accompanying math fonts, while the other Adobe serif fonts make use of the Computer Modern math fonts.
To use LaTeX and select Helvetica as the default font, without editing matplotlibrc use:
from matplotlib import rc
rc('font',**{'family':'sans-serif','sans-serif':['Helvetica']})
rc('text', usetex=True)
Here is the standard example, tex_demo.py:
%pylab inline
"""
Demo of TeX rendering.
You can use TeX to render all of your matplotlib text if the rc
parameter text.usetex is set. This works currently on the agg and ps
backends, and requires that you have tex and the other dependencies
described at http://matplotlib.sf.net/matplotlib.texmanager.html
properly installed on your system. The first time you run a script
you will see a lot of output from tex and associated tools. The next
time, the run may be silent, as a lot of the information is cached in
~/.tex.cache
"""
import numpy as np
import matplotlib.pyplot as plt
# Example data
t = np.arange(0.0, 1.0 + 0.01, 0.01)
s = np.cos(4 * np.pi * t) + 2
plt.rc('text', usetex=True)
plt.rc('font', family='serif')
plt.plot(t, s)
plt.xlabel(r'\textbf{time} (s)')
plt.ylabel(r'\textit{voltage} (mV)',fontsize=16)
plt.title(r"\TeX\ is Number "
r"$\displaystyle\sum_{n=1}^\infty\frac{-e^{i\pi}}{2^n}$!",
fontsize=16, color='gray')
# Make room for the ridiculously large title.
plt.subplots_adjust(top=0.8)
plt.savefig('tex_demo')
plt.show()
A common use of least-squares minimization is curve fitting, where one has a parametrized model function meant to explain some phenomena and wants to adjust the numerical values for the model to most closely match some data. With scipy, such problems are commonly solved with scipy.optimize.curve_fit(), which is a wrapper around scipy.optimize.leastsq(). Since Lmfit’s minimize() is also a high-level wrapper around scipy.optimize.leastsq() it can be used for curve-fitting problems, but requires more effort than using scipy.optimize.curve_fit().
Here we discuss lmfit’s Model class. This takes a model function – a function that calculates a model for some data – and provides methods to create parameters for that model and to fit data using that model function. This is closer in spirit to scipy.optimize.curve_fit(), but with the advantages of using Parameters and lmfit.
In addition to allowing you turn any model function into a curve-fitting method, Lmfit also provides canonical definitions for many known line shapes such as Gaussian or Lorentzian peaks and Exponential decays that are widely used in many scientific domains.
The Model allows us to easily wrap a model function. This automatically generate the appropriate residual function, and determines the corresponding parameter names from the function signature itself.
The Model knows the names of the parameters and the independent variables. By default, the first argument of the function is taken as the independent variable, held in independent_vars, and the rest of the functions positional arguments (and, in certain cases, keyword arguments – see below) are used for Parameter names.
On creation of the model, parameters are not created. The model knows what the parameters should be named, but not anything about the scale and range of your data. You will normally have to make these parameters and assign initial values and other attributes. To help you do this, each model has a make_params() method that will generate parameters with the expected names. This creates the Parameters but doesn’t necessarily give them initial values – again, the model has no idea what the scale should be. You can set initial values for parameters with keyword arguments to make_params(), or assign them (and other parameter properties) after the Parameters has been created.
A Model has several methods associated with it. For example, one can use the eval() method to evaluate the model or the fit() method to fit data to this model with a Parameter object. Both of these methods can take explicit keyword arguments for the parameter values. The fit() function returns a ModelFit object of results, which has many components, including a fit_report() method, init_fit for the fit with the initial parameter values and a best_fit for the fit with the best fit parameter values.
One of the more interesting features of the Model class is that Models can be added together or combined with basic algebraic operations (add, subtract, multiply, and divide) to give a composite model. The composite model will have parameters from each of the component models, with all parameters being available to influence the whole model. Each components can be generated after the fit using the Models ModelFit.eval_components() method of the result [ex. comps = result.eval_components()], which returns a dictionary of the components, using keys of the model name (or prefix if that is set). This will use the parameter values in result.params and the independent variables (x) used during the fit. Note that while the ModelFit held in result does store the best parameters and the best estimate of the model in result.best_fit, the original model and parameters in pars are left unaltered.
The combiantion works if the argument names for the model functions do not overlap. If they had, the prefix argument to Model would have allowed us to identify which parameter went with which component model. Using composite models with the built-in models provides a simple way to build up complex models. If you want to use a binary operator other than add, subtract, multiply, or divide that are supported through normal Python syntax, you’ll need to explicitly create a CompositeModel with the appropriate binary operator.
Example:
import numpy as np
from lmfit import Model, CompositeModel
from lmfit.lineshapes import step, gaussian
import matplotlib.pyplot as plt
# create data from broadened step
npts = 201
x = np.linspace(0, 10, npts)
y = step(x, amplitude=12.5, center=4.5, sigma=0.88, form='erf')
y = y + np.random.normal(size=npts, scale=0.35)
def jump(x, mid):
"heaviside step function"
o = np.zeros(len(x))
imid = max(np.where(x<=mid)[0])
o[imid:] = 1.0
return o
def convolve(arr, kernel):
# simple convolution of two arrays
npts = min(len(arr), len(kernel))
pad = np.ones(npts)
tmp = np.concatenate((pad*arr[0], arr, pad*arr[-1]))
out = np.convolve(tmp, kernel, mode='valid')
noff = int((len(out) - npts)/2)
return out[noff:noff+npts]
# create Composite Model using the custom convolution operator
mod = CompositeModel(Model(jump), Model(gaussian), convolve)
pars = mod.make_params(amplitude=1, center=3.5, sigma=1.5, mid=5.0)
# 'mid' and 'center' should be completely correlated, and 'mid' is
# used as an integer index, so a very poor fit variable:
pars['mid'].vary = False
# fit this model to data array y
result = mod.fit(y, params=pars, x=x)
print(result.fit_report())
# Use 'False' to plot as usual, while 'True' for plotting each components
plot_components = False
# plot results
plt.plot(x, y, 'bo')
if plot_components:
# generate components
comps = result.eval_components(x=x)
plt.plot(x, 10*comps['jump'], 'k--')
plt.plot(x, 10*comps['gaussian'], 'r-')
else:
plt.plot(x, result.init_fit, 'k--')
plt.plot(x, result.best_fit, 'r-')
plt.show()
A recurring argument against using Python for high-performance numerical computing is that this language is slow, because it is dynamic and interpreted. A compiled lower-level language such as C can often be orders of magnitude faster.
Operations on NumPy arrays can be almost as fast as C because slow Python loops are transparently replaced with fast C loops. Sometimes though, it may happen that vectorization is impossible or difficult to implement on some complex algorithms. In these cases, there are fortunately solutions other than throwing away all Python code and coding everything again in C.
First, one can take advantage of the multiple cores that are now present in any computer. A standard Python process normally runs on a single core, but it is possible to distribute tasks across multiple cores and even multiple computers in parallel. This is particularly easy to do with IPython. MPI can also be easily used with a few lines of code.
Another popular solution is to first detect the time-critical section of a Python algorithm and then replace it with C code. Typically, only a very small section of the Python code is responsible for most of the algorithm's duration, so that it is possible to keep the rest of the code in Python. Cython is an external package which makes this task easier than it sounds: it offers a superset of Python that is compiled and that can be seamlessly integrated within Python code. It is particularly convenient to use it with IPython.