Extraction of signals with ROI's

Typically, when calcium imaging, we will take the data from the camera and save it as a stack of images, where each image is a n by m array of values. Note that though we may be used to thinking about an image as a matrix, it is also reasonable to consider each of these images as the n x m -dimensional vector created by unraveling each image. The most natural representation will depend on the type of processing we want to perform.

When applying imaging techniques to neurophysiology, we will probably want to somehow reduce the dimensionality of each image in order to make sense of the data. Perhaps this is so that we have a manageable dataset, but often times the goal is to project the data into a more biologically relevant frame - we are not really interested in how the intensity of the pixels fluctuate over time, but rather the changes in the fluorescence of cells.

The simplest method for performing this analysis is to use regions of interest (ROI's): We identify a collection of pixels that fall within a region of the image that we think corresponds to the cells we are interested in. We then summarize the intensity of the those pixels with a simple statistic (mean, median ect..)

Projections into components

This ROI-baised approach works quite well when the cells we wish to record from are well separated from one another on the image, but sometimes things are not so convenient, and the cells overlap. This is often the case collecting images using wide-field microscopy, but this may even be a problem when using a three-dimensional imaging technique such as two photon microscopy.

There are a number of approaches for disentangling the signals from overlapping biological entities, but they all more or less start with the assumption that the image is some linear combination of overlapping parts known as components. What differs between these methods is the way in which these components are defined. Techniques such as Principle Components Analysis (PCA) or Independent Components Analysis (ICA) are considered blind techniques since the components are determined from the statistics of how pixels intensity changes over time. For instance, ICA searches for the components that such that the extracted signals are maximally independent over time.

In some cases -- for example the steering muscles of the thorax in Drosophila -- we are in a fortunate situation that we have a well defined understand the underlying anatomy. For this reason, we have the opportunity to define the components more explicitly from prior knowledge... we are not blind.

To provide a simple example of how we can take advantage of prior knowledge of anatomy, consider a very low resolution image, $\mathbf{y}$, that is composed of only two pixels. Pretend that we know that within the field of view of this image there is a large cell that fills both pixels $y_{1}$ and $y_{2}$. We also know from prior experience that there is a small, 1-pixel wide cell imaged by pixel $y_{2}$. Now if we knew the fluorescence intensity of each of these cells we would predict the intensity of the final image to be the sum of these two distinct components. Say the large cell is active at an intensity $\beta_{c}$, and the small cell is active at a value of $\beta_{d}$, then the final distribution of pixel intensities would be: $\mathbf{y} = \begin{bmatrix}\beta_{c}&\beta_{d}+\beta_{c}\end{bmatrix}$

Figure 1.1 summation of fluorescence from two cells on a small sensor

We are really interested in the inverse problem; our goal is to determine $\beta_{c}$ and $\beta_{d}$ from the image, $\mathbf{y}$, a fairly easy problem in this case. We get $\beta_{d}$, from the value of $y_{1}$, and we can get $\beta_{2}$ by subtracting $\beta_{c}$ from $y_{2}$.

A more general way of solving for $\beta_{c}$ and $\beta_{d}$ however is by writing the matrix equation:

$\begin{bmatrix}\mathbf{x_{1}} & \mathbf{x_{2}}\end{bmatrix}\boldsymbol{\beta} = \mathbf{y}$ where $\mathbf{x_{1}}$,$\mathbf{x_{2}}$ are respectively, the column vectors $\begin{bmatrix} 1 \\ 1 \end{bmatrix}$ and $\begin{bmatrix} 0 \\ 1\end{bmatrix}$, that tell us the spatial distribution of cell b and cell c and $\boldsymbol{\beta}$ is a column vector of unknown coefficients $\begin{bmatrix}\beta_{c} \\ \beta_{d}\end{bmatrix}$. Using this relation we can then just solve for the betas with: $\boldsymbol{\beta} = \begin{bmatrix}\mathbf{x_{1}} & \mathbf{x_{2}}\end{bmatrix}^{-1}\mathbf{y}$

In [1]:
import numpy as np
b1,b2 = (3,8)
x1 = np.array([1,1])
x2 = np.array([0,1])
img = x1*b1 + x2*b2
xinv = np.linalg.inv(np.vstack((x1,x2)).T)
b1_,b2_ = np.dot(xinv,img)
print np.allclose((b1_,b2_),(b1,b2))
True
In [2]:
# Make Fig 1.1
import figurefirst as fifi
import matplotlib.pyplot as plt
from IPython.display import display,SVG
layout = fifi.FigureLayout('data/two_pix_layout.svg',make_mplfigures = True)
layout.axes['x1'].pcolor([x1*b1],cmap = plt.cm.gray,vmin = -1,vmax = 12,edgecolors='w')
layout.axes['x2'].pcolor([x2*b2],cmap = plt.cm.gray,vmin = -1,vmax = 12,edgecolors='w')
layout.axes['img'].pcolor([img],cmap = plt.cm.gray,vmin = -1,vmax = 12,edgecolors='w')
[fifi.mpl_functions.kill_spines(ax) for ax in layout.axes.values()];plt.close()
layout.save('data/two_pix.svg',hidelayers = ['Layer 1'],targetlayer = 'mpltarget')
#display(SVG('data/two_pix.svg'))

Dimensionality reduction using regression

Of course in the above example, I have set things up so that the there is a single, unique solution to our linear system since the matrix $\mathbf{X} = \begin{bmatrix}\mathbf{x_{1}} & \mathbf{x_{2}}\end{bmatrix}$ (sometimes called the design matrix) is square and invertible. In practice, this is not often (probably never) going to be the case. We will have many more pixels than components (the system is overdetermined), and our image will be contaminated by some noise (making the system inconsistent).

To solve the problem when we have a non-square $\mathbf{X}$ and inconsistent data, we can use the generalized inverse, or pseudo-inverse to find the least-squares solution.

To consider a more realistic example, lets pretend we now have a 3x3 pixel sensor. Again, there are two cells, but now cell 1 fills a 2x2 pixel area in the top left of the sensor, and cell 2 is where it was before -- one pixel over in the top row.

Figure 1.2 two-component 3x3 image

To solve this problem we unravel the image and components to treat them as vectors and then solve using: $\boldsymbol{\beta} = \mathbf{X}^{+}\mathbf{y}$ where $\mathbf{X}^+$ is the pseudoinverse of $\mathbf{X}$.

It is worth realizing here, that if the components of $\mathbf{X}$ don't overlap, this approach will simply return the average pixel intensity within each of the components -- equivalant to using ROI's. Additionally, there is no reason why the elements of the design matrix need to be binary. Since $\mathbf{X}$ essentially constitutes a model of how we believe fluorescence from the underlying biological entities contribute to photons detected on our imaging sensor, we can use fractional values in $\mathbf{X}$ to account for blur from out of focus light if we choose. The extraction routene in used by the planotaxis program uses a model constructed by blurring high resolution confocal sections to approximate the view through our wide-field objective.

Questions:

How sensitive are the results to noise? How do the results depend on the size of the components? How about the amount of overlap?

In [46]:
import numpy as np
b1,b2 = (3.,8.)
print 'true activations:%.2f,%.2f'%(b1,b2)
x1 = np.array([[1,1,0],[1,1,0],[0,0,0]]) # cell 1
x2 = np.array([[0,1,0],[0,0,0],[0,0,0]]) # cell 2
img = x1*b1 + x2*b2
noise = (np.random.rand(3,3)-0.5)*3;
img += noise
xinv = np.linalg.pinv(np.vstack((x1.ravel(),x2.ravel())).T)
b1_,b2_ = np.dot(xinv,img.ravel())
print 'estimated activations:%.2f,%.2f'%(b1_,b2_)
true activations:3.00,8.00
estimated activations:2.93,7.60
In [47]:
# Make Fig 1.2
layout = fifi.FigureLayout('data/components_layout.svg',make_mplfigures = True)
for axlbl,dta in zip(('x1','x2','noise','img'),(x1*b1,x2*b2,noise,img)):
    layout.axes[axlbl].pcolor(dta[::-1,:],cmap = plt.cm.gray,
                              vmin = -1,vmax = 13,edgecolors='w')
[fifi.mpl_functions.kill_spines(ax) for ax in layout.axes.values()];plt.close()
layout.save('data/components.svg',hidelayers = ['Layer 1'])
#display(SVG('data/components.svg'))

The above equations can be applied sequentially to each acquired image to extract the relative brightness of the set of underlying signals. There are some additional details to consider given the practicalities of real-world imaging. For instance, often times there will be sources of fluorescence that we cannot account for in our anatomical model; these may be static sources (those that are present at the same level in each frame e.g. autofluorescence) as well as dynamic sources. We can deal with the static sources if we have some epoch where we can make the reasonable assumption that florescence from our signals of interest if fairly low. In this case we would subtract the image during this baseline epoch from the images collected when the cells are active.

Things are more challenging if we have unaccounted signals that vary in time. This might be because these unidentified sources are diffuse and difficult to pin on any particular cellular locus. In this case, one approach is to collect all the diffuse unidentified signals into a single term in the above equation. Essentially adding a column $\mathbf{x_{n}}$ where all the values are 1. In the language of regression, we would refer to this term as an intercept.

Another consideration enforced by biology is that the coeficients we are solving for should not contain negative values. Fortunately, this problem has been previously addressed within the context of reggression and $\mathbf{y} = \boldsymbol{\beta}\mathbf{X}$ can be solved using non-negative least squares. This is avalable in python with scipy.optimize.nnls.

In [48]:
from scipy.optimize import nnls
betas,rnorm = nnls(np.vstack((x1.ravel(),x2.ravel())).T,img.ravel())
print 'estimated activations:%.2f,%.2f'%(betas[0],betas[1])
estimated activations:2.93,7.60

Comments

comments powered by Disqus