1 Inbred Mouse Strains

We start off with over 1000 mouse samples measured at the TOMCAT beamline of the Swiss Light Source at the Paul Scherrer Institute in Villigen, Switzerland. Each sample consists of 14GB of image data as well as 98 genetic tags correlating each sample and phenotype to a specific pattern of inheritance corresponding to 10s of terabytes of data which analyzed normally would require a dozens of scripts, cluster management tools, and a lot of patience.

With IJSQL from 4Quant you can now do such analyses as easily as a SQL database query (even from Excel if you wish) and IJSQL handles loading the data, making sure it is evenly distributed, optimizing queries, and making a cluster or even entire cloud of computers work like one super-fast one using the latest generation Big Data technology.

2 Load the data in

The first step is creating the cluster, this can be done using public clouds like those available at Amazon AWS, Google Compute Engine, Databricks Cloud or even own your own cluster. Once the Spark Cluster has been created, you have the SparkContext called sc the data can be loaded using the Spark Image Layer.

val uctImages = 
    sc.readImage[Float]("s3n://bone-images/f2-study/*/ufilt.tif").cache

We can then move the data into our IJSQL database so instead of writing Scala code we can utilize easy SQL commands for further analysis.

uctImages.registerAsImageTable("ImageTable")

Although we execute the commands on only one machine, the data will be evenly loaded over all of the machines in the cluster (or cloud). We can show any of these images at any point by just typing

uctImages.first().show(1)

Mineralization Density

3 Image Processing

Once the table has been registered we can perform our analysis using the easy IJSQL interface (or use our Python and Java APIs to make your own analysis). The next steps for this bone analysis - extracting the porosity data from the images - analyzing the cells (small pores) inside

3.1 Image Enhancement

Since the measurements have some degree of noise from the detectors, we first clean up the images using a Gaussian Filter.

CREATE TABLE FilteredImages AS
SELECT boneId,GaussianFilter(image) FROM ImageTable

3.1.1 Other filters

Any ImageJ plugin can be easily used inside IJSQL and for a 3x3 Median filter

SELECT boneId,run(image,"Median...","radius=3") FROM ImageTable

3.1.2 3D Renderings

We also offer a number of 3D rendering options for when a single slice does not give enough detail. This is rendered using the cluster and just the final image is sent to your machine so even huge images can be rendered quickly.

uctImages.render3D(slice=0.2,lut="gray").first()

3D renderings

3.2 Segmentation

To segment the images into bone and air, we can either manually specify a cut-off or simply use an automated approach like Otsu, IsoData, or Intermodes.

CREATE TABLE ThresholdImages AS
SELECT boneId,ApplyThreshold(image,OTSU) FROM FilteredImages

As with the last steps, a slice can be immediately inspected for one or more images with

sql("SELECT image FROM ThresholdImages").first().show(1)

Calcified Tissue

3.3 Mask Creation and Porosity Extraction

From the segmented image, we can extract the cells by first creating a mask with all of the holes filled in.

CREATE TABLE MaskImages AS
SELECT boneId, FillHoles(image) FROM ThresholdImages
sql("SELECT image FROM ThresholdImages").first().show(1) ``` 

![Filled Holes](ext-figures/bone-filled.png)
CREATE TABLE CorticalImages AS
SELECT boneId, PeelMask(thr.Image,mask.Image) FROM ThreshImage thr 
  INNER JOIN MaskImages mask ON thr.boneId = mask.boneId
sql("SELECT image FROM CorticalImages").first().show(1) 
  • Cortical Image
CREATE TABLE PorosityImages AS
SELECT boneId,PeelMask(run(thr.image,"Invert"),mask.image) 
  FROM ThreshImage thr 
  INNER JOIN MaskImages mask ON thr.boneId = mask.boneId
sql("SELECT image FROM PorosityImages").first().show(1) 
  • Cortical Image

3.4 Labeling Objects

We can then identify the individual cells using component labeling.

CREATE TABLE LabelImages AS
SELECT boneId,ComponentLabel(image) FROM PorosityImages

3.4.1 Cells from Vessels

We can also utilize component labeling to help us distinguish cells from vessels

CREATE TABLE VesselImages AS
SELECT * FROM LabelImages WHERE obj.VOLUME>1000
CREATE TABLE CellImages AS
SELECT * FROM LabelImages WHERE obj.VOLUME<1000
multicolor3D(red=vesselImages.first,green=cellImages.first)
  • Vessels and Cells

3.5 Shape Analysis (Volume, Position, Shape)

Now we can calculate the shape information for the cell volume to look at some of the statistics.

CREATE TABLE CellAnalysis AS
SELECT boneId,AnalyzeShape(CellImages) FROM CellImages GROUP BY boneId

Once this analysis is done, we can move back to SQL and use standard commands for analyzing and visualizing all of the shapes

shapeAnalysis.toPointDF().registerTempTable("BoneAnalysis")

4 Analyzing 1000s of samples

4.1 Overview of all of the animals

MGROUP Animals Female.Count Male.Count Source
Group 1 - B6 lit/lit female 14 14 0 PROGENITOR
Group 10 - B6xC3.B6F1 lit/lit male 5 0 5 PROGENITOR
GROUP 11 - B6xC3.B6F2 lit/lit 1960 1017 933 F2
Group 2 - B6 lit/lit male 15 0 15 PROGENITOR
Group 3 - C3.B6 lit/lit female 18 18 0 PROGENITOR
Group 4 - C3.B6 lit/lit male 16 0 16 PROGENITOR
Group 5 - B6 lit/+ female 15 15 0 PROGENITOR
Group 6 - B6 lit/+ male 12 0 12 PROGENITOR
Group 7 - C3.B6 lit/+ female 15 15 0 PROGENITOR
Group 8 - C3.B6 lit/+ male 15 0 15 PROGENITOR
Group 9 - B6xC3.B6F1 lit/lit female 11 11 0 PROGENITOR

We can then combine this with our shape information with a join command. In this case the genomic information comes from a text file, but this can easily come from an Excel file, SQL database, S3 store, or another Spark Analysis.

CREATE TABLE GenomicCellAnalysis
SELECT * FROM MouseHistory mh JOIN CellAnalysis ca 
  ON mh.boneId == ca.boneId

4.2 Preview of the raw data

The raw data can be read out as a table for investigating individual samples.

MGROUP Gender SAN VOLUME LACUNA_NUMBER POS_X POS_Y POS_Z GROUP Strain Growth
Group 1 - B6 lit/lit female female 1 2.0e-07 1 0.3181591 0.0020761 0.0023529 B6 lit/lit female B6 lit/lit
Group 1 - B6 lit/lit female female 1 1.9e-06 2 0.3435069 0.0064215 0.0067113 B6 lit/lit female B6 lit/lit
Group 1 - B6 lit/lit female female 1 8.0e-07 3 0.4145227 0.0021684 0.0056980 B6 lit/lit female B6 lit/lit
Group 1 - B6 lit/lit female female 1 1.0e-06 4 0.4562597 0.0024831 0.0108601 B6 lit/lit female B6 lit/lit
Group 1 - B6 lit/lit female female 1 4.0e-07 5 0.6334585 0.0031225 0.0038936 B6 lit/lit female B6 lit/lit
Group 1 - B6 lit/lit female female 1 7.0e-07 6 0.6546799 0.0121322 0.0020201 B6 lit/lit female B6 lit/lit

4.3 Viewing Individual Samples

The shape analysis from a single sample can easily be brought up and rendered in the browser.

SELECT lacuna_points FROM GenomicCellAnalysis WHERE strain == "B6" & gender == "female" LIMIT 1

A number of further analyses can be made in both 2D and 3D plots looking at everything from cell size and shape to density.

SELECT lacuna_points FROM CellAnalysis WHERE strain == "C3H" & gender == "female" LIMIT 1

4.4 Compare two datasets directly

{
  sql("SELECT lacuna_points FROM GenomicCellAnalysis WHERE strain == 'B6' & gender == 'female'") 
  ++
  sql("SELECT lacuna_points FROM GenomicCellAnalysis WHERE strain == 'C3H' & gender == 'female')
}.groupBy("strain").show

4.5 Run Standard Analyses over millions of cells

Instead of storing the results in tables for each sample, have each of the cells as a row in a new table in the database called AllLacunae (this time with 50+ million rows)

GenomicCellAnalysis.flattenDF().registerTempTable("AllLacunae")

We can now run SQL commands and get results instantly even though it would take minutes to hours on a standard MySQL instance.

SELECT AVG(VOLUME),SD(VOLUME) FROM GenomicCellAnalysis GROUP BY boneId

5 Acknowledgements

Analysis powered by IJSQL and Spark Image Layer from 4Quant, Visualizations, and Document Generation provided by:

To cite ggplot2 in publications, please use:

H. Wickham. ggplot2: elegant graphics for data analysis. Springer New York, 2009.

A BibTeX entry for LaTeX users is

@Book{, author = {Hadley Wickham}, title = {ggplot2: elegant graphics for data analysis}, publisher = {Springer New York}, year = {2009}, isbn = {978-0-387-98140-6}, url = {http://had.co.nz/ggplot2/book}, }

To cite package ‘threejs’ in publications use:

B. W. Lewis (2015). threejs: 3D Graphics using Three.js and Htmlwidgets. R package version 0.2.1. http://bwlewis.github.io/rthreejs

A BibTeX entry for LaTeX users is

@Manual{, title = {threejs: 3D Graphics using Three.js and Htmlwidgets}, author = {B. W. Lewis}, year = {2015}, note = {R package version 0.2.1}, url = {http://bwlewis.github.io/rthreejs}, }

ATTENTION: This citation information has been auto-generated from the package DESCRIPTION file and may need manual editing, see ‘help(“citation”)’.

To cite plyr in publications use:

Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. URL http://www.jstatsoft.org/v40/i01/.

A BibTeX entry for LaTeX users is

@Article{, title = {The Split-Apply-Combine Strategy for Data Analysis}, author = {Hadley Wickham}, journal = {Journal of Statistical Software}, year = {2011}, volume = {40}, number = {1}, pages = {1–29}, url = {http://www.jstatsoft.org/v40/i01/}, }

To cite package ‘rmarkdown’ in publications use:

JJ Allaire, Joe Cheng, Yihui Xie, Jonathan McPherson, Winston Chang, Jeff Allen, Hadley Wickham and Rob Hyndman (2015). rmarkdown: Dynamic Documents for R. R package version 0.5.1. http://CRAN.R-project.org/package=rmarkdown

A BibTeX entry for LaTeX users is

@Manual{, title = {rmarkdown: Dynamic Documents for R}, author = {JJ Allaire and Joe Cheng and Yihui Xie and Jonathan McPherson and Winston Chang and Jeff Allen and Hadley Wickham and Rob Hyndman}, year = {2015}, note = {R package version 0.5.1}, url = {http://CRAN.R-project.org/package=rmarkdown}, }

To cite the ‘knitr’ package in publications use:

Yihui Xie (2015). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.10.

Yihui Xie (2013) Dynamic Documents with R and knitr. Chapman and Hall/CRC. ISBN 978-1482203530

Yihui Xie (2014) knitr: A Comprehensive Tool for Reproducible Research in R. In Victoria Stodden, Friedrich Leisch and Roger D. Peng, editors, Implementing Reproducible Computational Research. Chapman and Hall/CRC. ISBN 978-1466561595