Data Preprocessing

Import raw microscopy data

Data can be imported from czi and tiff files. The gbeflow.CziImport wraps the czifile module published by Christoph Gohlke to read the propriatory zeiss file format. The underlying datastructure of czi’s varies based on the collection parameters, so the gbeflow.CziImport may not work reliabily.

import gbeflow

# Load czi file from relative/absolute path
czi = gbeflow.CziImport(filepath)

# Raw array of data is available as an attribute
raw = czi.raw_im

# Data squeezed to minimum dimensions also available
data = czi.data

Alternativly, data can be read from tiff files saved using Fiji. The module tifffile makes this by far the easiest approach with tifffile.imread and tifffile.imsave.

import tifffile

# Import tiff file from relative or absolute path
img = tifffile.imread(filepath)

# Save array to new tiff file
tifffile.imsave(outpath, data=img)

Embryo alignment

During imaging, embryos are not expected to be aligned in any particular orientation in the XY plane. While we will accept only embryos that have an approximate lateral mounting, we need to correct XY positioning in post-processing. In the notebook rotate_embryo, a workflow is proposed that accepts user inputs in order to guide the alignment process. The user input functions rely on bebi103 which is a package written by Justin Bois for the BE/Bi103 course that is still under active development. Below is an example of what processing a single sample might look like.

import os
import glob
import tqdm

import numpy as np
import pandas as pd
import scipy.ndimage

import tifffile
import bebi103

import bokeh.io
bokeh.io.output_notebook()

import gbeflow
Loading BokehJS ...

Import first time point from each original tiff

files = glob.glob(os.path.abspath('../data/original/20181130-gbe_mutants_brt/*.tif'))
files = files

The key parameter for tifffile.imread lets us import just the first timepoint with both channels. This makes it possible so that we can look at a sample of all of the data at once. Since the total dataset is >20Gb we can’t load it directly into memory.

%%time
raw = {}
for f in files:
    raw[f] = tifffile.imread(f,key=(0,1))

Select points to use for alignment

Try selecting two points on the dorsal surface that represent the linear plane of the dorsal side. This approach should hopefully be less sensitive to variability in how the user picks the points.

clicks = []
for f in files:
    clk = bebi103.viz.record_clicks(raw[f][1],flip=False)
    clicks.append(clk)

Extract the points selected for each image into a dataframe.

Ldf = []
for clk in clicks:
    Ldf.append(clk.to_df())

points = pd.concat(Ldf,keys=files)
points
x y
/Users/morganschwartz/code-block/germband-extension/data/original/20181130-gbe_mutants_brt/20181130-gbe_mutants_brt_05_eve.tif 0 583.782051 811.211429
1 819.794872 311.188571
/Users/morganschwartz/code-block/germband-extension/data/original/20181130-gbe_mutants_brt/20181130-gbe_mutants_brt_21_kr.tif 0 182.525641 412.960000
1 380.769231 829.177143
/Users/morganschwartz/code-block/germband-extension/data/original/20181130-gbe_mutants_brt/20181130-gbe_mutants_brt_17_kr.tif 0 368.294872 650.411429
1 550.384615 206.422857
/Users/morganschwartz/code-block/germband-extension/data/original/20181130-gbe_mutants_brt/20181130-gbe_mutants_brt_23_kr.tif 0 334.474359 685.405714
1 352.769231 197.600000

Reshape points array to have one row per sample.

points = points.reset_index(level=1)
points.head()
level_1 x y
/Users/morganschwartz/code-block/germband-extension/data/original/20181130-gbe_mutants_brt/20181130-gbe_mutants_brt_05_eve.tif 0 583.782051 811.211429
/Users/morganschwartz/code-block/germband-extension/data/original/20181130-gbe_mutants_brt/20181130-gbe_mutants_brt_05_eve.tif 1 819.794872 311.188571
/Users/morganschwartz/code-block/germband-extension/data/original/20181130-gbe_mutants_brt/20181130-gbe_mutants_brt_21_kr.tif 0 182.525641 412.960000
/Users/morganschwartz/code-block/germband-extension/data/original/20181130-gbe_mutants_brt/20181130-gbe_mutants_brt_21_kr.tif 1 380.769231 829.177143
/Users/morganschwartz/code-block/germband-extension/data/original/20181130-gbe_mutants_brt/20181130-gbe_mutants_brt_17_kr.tif 0 368.294872 650.411429

Calculate a line for each embryo

\[y - y_1 = m(x - x_1)\]
\[m = \frac{y_2 - y_1}{x_2 - x_1}\]
line = gbeflow.calc_line(points)
line = line.reset_index().rename(columns={'index':'f'})
line.head()
f x1 x2 y1 y2 dx dy m
0 /Users/morganschwartz/code-block/germband-extension/... 583.782051 819.794872 811.211429 311.188571 236.012821 -500.022857 -2.118626
1 /Users/morganschwartz/code-block/germband-extension/... 182.525641 380.769231 412.960000 829.177143 198.243590 416.217143 2.099524
2 /Users/morganschwartz/code-block/germband-extension/... 368.294872 550.384615 650.411429 206.422857 182.089744 -443.988571 -2.438295
3 /Users/morganschwartz/code-block/germband-extension/... 334.474359 352.769231 685.405714 197.600000 18.294872 -487.805714 -26.663522

Plot embryos with line on top

\[y = m(x - x_1) + y_1\]
# Create list to collect plot objects
Lp = []

# X values to compute line on
x = np.linspace(0,1024,100)

for f in files:
    p = bebi103.viz.imshow(raw[f][1,:,:],flip=False)

    x1 = line[line['f']==f]['x1'].values
    y1 = line[line['f']==f]['y1'].values
    m = line[line['f']==f]['m'].values
    y = m*(x-x1) + y1

    p.line(x,y,color='red',line_width=3)
#     p.scatter(line[line['f']==f]['x'], line[line['f']==f]['y'],color='white',size=15)

    Lp.append(p)

bokeh.io.show(bokeh.layouts.gridplot(Lp,ncols=2))

Calculate rotation

The angle of rotation is calculated as follows

\[\theta = \arctan\bigg(\frac{y_2-y_1}{x_2-x_1}\bigg)\]

This calculation can be code-blockd using the np.arctan2, which has two arguments that correspond to \(\Delta y\) and \(\Delta x\).

line = gbeflow.calc_embryo_theta(line)
line.head()
f x1 x2 y1 y2 dx dy m theta
0 /Users/morganschwartz/code-block/germband-extension/... 583.782051 819.794872 811.211429 311.188571 236.012821 -500.022857 -2.118626 -64.732499
1 /Users/morganschwartz/code-block/germband-extension/... 182.525641 380.769231 412.960000 829.177143 198.243590 416.217143 2.099524 64.531611
2 /Users/morganschwartz/code-block/germband-extension/... 368.294872 550.384615 650.411429 206.422857 182.089744 -443.988571 -2.438295 -67.700358
3 /Users/morganschwartz/code-block/germband-extension/... 334.474359 352.769231 685.405714 197.600000 18.294872 -487.805714 -26.663522 -87.852162

Apply rotation based on \(\theta\)

With \(\theta\) calculated, we are now ready to rotate each sample accordingly. Since we cannot load all of the data into memory at the same time, we will currently only rotate the first timepoint to check that it worked. After we have determined all necessary manipulations for each embryo, we will run the actual rotation.

# Dataframe to save first timepoint from each rotate embryo
rot = {}

# List to save bokeh plots
Lp = []

for f in tqdm.tqdm(files):
    # Extract the theta value for this sample
    theta = line[line['f']==f]['theta'].values[0]

    # Rotate single image
    rimg = scipy.ndimage.rotate(raw[f][1],theta)

    # Save and plot first timepoint
    rot[f] = rimg
    p = bebi103.viz.imshow(rimg,title=f)
    Lp.append(p)
100%|██████████| 4/4 [00:00<00:00,  6.63it/s]
bokeh.io.show(bokeh.layouts.gridplot(Lp,ncols=2))
# Import modules
import bebi103
import gbeflow
import tifffile
import scipy.ndimage

# Import image from tiff
raw = tifffile.imread(impath)

# Select first timepoint from `raw` to display
clk = bebi103.viz.record_clicks(raw[0],flip=False)

# Extract points from `clk` to a dataframe
points = clk.to_df().reset_index()

# Calculate line for each embryo
line = gbeflow.calc_line(points
                ).reset_index(
                ).rename(
                    columns={'index','f'}
                )

# Calculate angle of rotation
line = gbeflow.calc_embryo_theta(line)

# Rotate embryo
rimg = scipy.ndimage.rotate(raw,line['theta'])

# Save rotated stack to file
tifffile.imsave(outpath,data=rimg)

Manual curation of orientation

After the rotation based on \(\theta\) has been applied, the embryos should be positioned such that the AP axis is horizontal. However, the rotation does not gurantee that the embryo will be positioned with anterior left and dorsal up. At this point, the user can individually specify any additional rotations to correct the embryo’s orientation. The following examples are typically sufficient to correct most orientation errors:

# Rotate by 180 degrees around center point
img = scipy.ndimage.rotate(img,180)

# Flip horizontally by specifying a specific axis
img = np.flip(img,axis=-1)

# Flip vertically
img = np.flip(img,axis=-1)

The final result of this workflow is that all samples are aligned dorsal up with the anterior end of the embryo to the left. This consistent alignment should facilitate future comparisons of the results of optical flow.

Embryo Masking

Rationale

We want to eliminate the background by segmenting the embryo from the background. Later in our analysis, our job will be easier if we do not need to worry about the background contributing noise.

Basic Approach

Segment the embryo from the background in order to clearly isolate changes in signal associated with germ band extension

import numpy as np
import matplotlib.pyplot as plt

from imp import reload
import sys
sys.path.insert(0, '..')
import utilities as ut

from skimage import img_as_float
from skimage.filters import gaussian
from skimage.segmentation import active_contour
from skimage.measure import grid_points_in_poly

Data Import

Read in hdf5 hyperstack as a 3d numpy array. Convert to float for ease of processing.

hst = img_as_float(ut.read_hyperstack('../data/wt_gbe_20180110.h5'))

Select a single image from the first timepoint to use as a test case for segmentation development

img = hst[0]
ut.imshow(img)
../_images/output_8_0.png

Background Segmentation

In order to analyze any signal variation that is associated with germ band extension, we need to isolate the signal of the embryo from any background. This task is complicated by the nature of brightfield images where the intensity of the background ~1 is similar to the intensity of cells within the image. As a result we cannot use a simple threshold which would remove signal from within the embryo as well as from the background.

However, the embryo does have a clear dark line that separates the embryo itself from the background. We will take advantage of this clear line to separate the embryo from the background using contour based methods.

To start, we will approximate an ellipse that is slightly bigger than the size of the embryo, which will provide a starting point for contour fitting.

# Values to calculate the ellipse over using parametrized trigonometric fxns
s = np.linspace(0, 2*np.pi, 400)

# Define the approximate center of the embryo/ellipse
center_x = 500
center_y = 530

# Define the radius of the ellipse in x and y
radius_x = 550
radius_y = 250

# Calculate the position of the ellipse as a function of s
x = center_x + radius_x*np.cos(s)
y = center_y + radius_y*np.sin(s)
init = np.array([x, y]).T

Next we will use skimage’s active_contour method to fit our approximated ellipse to the contour of the embryo. The kwarg parameters for this function were copied based on the active countour tutorial.

snake = active_contour(gaussian(img, 3),
                       init, alpha=0.015, beta=10, gamma=0.001)
fig,ax = plt.subplots(figsize=(10,10))
ax.imshow(img)
ax.plot(init[:, 0], init[:, 1], '--r', lw=3,label='estimate')
ax.plot(snake[:, 0], snake[:, 1], '-r', lw=3,label='fitted contour')
ax.legend()
<matplotlib.legend.Legend at 0x1c19e6f048>
../_images/output_16_1.png

The plot above shows our image overlaid with the approximated ellipse (dashed line) and the fitted counter (red continuous line). This contour follows the boundary between the embryo and the background.

Create background mask

Now that we have a estimated function snake that defines the boundary of the embryo and the background, we need to define a mask in the shape of the image that defines which points belong in the image. Skimage’s grid_points_in_poly function takes a set of points defining a shape (snake) and identifies which points over a given raster area fall within the input shape.

mask = grid_points_in_poly(img.shape, snake).T
plt.imshow(mask)
<matplotlib.image.AxesImage at 0x1c1aa11c88>
../_images/output_20_1.png
im_masked = img.copy()
im_masked[~mask] = 0
ut.imshow(im_masked)
../_images/output_21_0.png

Write a function to fit a contour to a new embryo

def calc_ellipse(center_x,center_y,radius_x,radius_y):
    '''
    Calculate a parametrized ellipse based on input values
    '''

    # Values to calculate the ellipse over using parametrized trigonometric fxns
    s = np.linspace(0, 2*np.pi, 400)

    # Calculate the position of the ellipse as a function of s
    x = center_x + radius_x*np.cos(s)
    y = center_y + radius_y*np.sin(s)
    init = np.array([x, y]).T

    return(init)
def contour_embryo(img,init):
    '''
    Fit a contour to the embryo to separate the background
    Returns a masked image where all background points = 0
    '''

    # Fit contour based on starting ellipse
    snake = active_contour(gaussian(img, 3),
                       init, alpha=0.015, beta=10, gamma=0.001)

    # Create boolean mask based on contour
    mask = grid_points_in_poly(img.shape, snake).T

    # Apply mask to image and set background to 0
    img[~mask] = 0

    return(img)

Apply mask to hyperstack

Check ellipse approximation on first and last timepoints

center_x,center_y = 500,530
radius_x,radius_y = 550,250
ellipse = calc_ellipse(center_x,center_y,radius_x,radius_y)
fig,ax = plt.subplots(1,2,figsize=(10,8))
ax[0].imshow(hst[0])
ax[0].plot(init[:,0],init[:,1],'--r',lw=3)
ax[1].imshow(hst[-1])
ax[1].plot(init[:,0],init[:,1],'--r',lw=3)
[<matplotlib.lines.Line2D at 0x1c1a65a978>]
../_images/output_28_1.png
# Loop through each timepoint in hyperstack
for t in range(hst.shape[0]):
    hst[t] = contour_embryo(hst[t],ellipse)
fig,ax = plt.subplots(1,2,figsize=(10,8))
ax[0].imshow(hst[0])
ax[1].imshow(hst[-1])
<matplotlib.image.AxesImage at 0x1c1a3349e8>
../_images/output_30_1.png

Following the development of the method above, an object was created within gbeflow to handle the tasks associated with contouring: MaskEmbryo. An example of using this new class is included below.

import gbeflow
import tifffile
import bebi103

# Import sample from tiff
img = tifffile.imread(filepath)

# Select endpoints of embryo for ellipse calculation
clk = bebi103.viz.record_clicks(raw[0],flip=False)

# Save points to dataframe after selection
points = clk.to_df()

# Initialize object with selected points
me = gbeflow.MaskEmbryo(points)

# Contour embryo based on ellipse calculated during initialization
# Input accepts only 2d data
mask = me.contour_embryo(img[0])

# Apply binary mask to the raw image
mask_img = me.mask_image(img,mask)

After running the code above, this workflow can be applied to the remaining timepoints and saved as a tif.

Limitations

While this approach worked well on the test dataset, when applied to other embryos it would make mistakes on ~10% of frames. There does not appear to be a straightforward way to improve performance without manual curation. See 20181106-endpoint_ellipse.ipynb and 20181108-vector_calculation.ipynb for examples of masking mistakes.

Next Steps

The implementation of contour masking shown in 20181108-vector_calculation.ipynb fits an individual contour to create a mask for each timepoint in the timecourse. Simply by repeating the calculation many times, we introduce more opportunities for mistakes. In the future the contour should be fit just to the first timepoint to create a mask that the optical flow algorithm can use on all timepoints (See the paper’s supplement for more information.) To accommodate any minute changes in embryo size, we could try to increase the size of the mask by 10%, but given the large scale of the movements that we are interested in it may not be necessary.