Manual & Documentation
Abundance profiles from metagenomic sequencing data synthesize information from billions of sequenced DNA reads from thousands of microbial genomes. Interpreting these profiles can be a challenge since the data they represent is very complex.
Particularly challenging is their visualization, as existing techniques are inadequate when the taxa number in the thousands. Microbiome Maps are visualizations of abundance profiles using a space-filling curve.
Jasper is a tool for visualizing abundance profiles from metagenomic whole-genome DNA sequencing and 16S sequencing. It creates easy to understand images using a Hilbert Curve. Jasper is FREE and the current version runs on macOS.
Jasper Software
Where do I get it?
The Jasper software is a set of easy to use graphical and command-line applications designed for macOS, Python, and R. You can download them at the Mac App Store, or from GitHub.
How do I install it?
Installation is simple: just search for βMicrobiome Mapsβ or βJasperβ in the Mac App Store and then install Jasper as you would any other application. For the Python and R scripts, follow the instructions at the repository.
How much is it?
Jasper is completely free. π
How do I use it?
Jasper is very easy to use. When you start the macOS version, you can just play around with the interface to create sample Hilbert Curves to get an idea of how space-filling curves work. The βHilbert Curveβ slider on the app allows you to create curves up to level 10 β curves of a higher level partition the image into very small sections that are not distinguishable with current monitors, and the result is a big grey image. For the Python script, just run it with the applicable parameters for your data.
Abundance Profile
To get started with Jasper, youβll need a metagenomic abundance profile from either whole-genome sequencing data, or 16S sequencing. The same profile format works for both the GUI and CLI versions. The abundance profile is formatted using a simple β.txtβ file with 5 or 6 tab-delimited fields. These fields are:
Taxa ID
Accession Number (Genome Assembly)
Kingdom
Taxa Name
Abundance
Condition Label
The last field is optional if you are using a βTaxonomicβ scheme, but required for the βLabeledβ scheme. After Jasper loads your profile, you can click on any part of the image and get a popover that will allow you to find out more about the taxa you just clicked on.
Fields & Integrations
Jasper integrates with several annotation authorities such as Ensembl, NCBI, and Uniprot. After you load your profile, you will be able to click on any region of the image and get a popover view with links that will open a website with more information about the taxa you just clicked on. Below is a discussion on the specifics of each field.
β οΈ Note: If you do not have a βTaxa IDβ, just add β00β (thatβs a double zero) in its place. For the remaining fields, if you do not have data, just substitute βNAβ in its place.
Fields
β’ Taxa ID : numeric
A numeric identifier that can have length of up to seven digits. If you do not have a Taxa ID, substitute with a double-zero (β00β).
β’ Accession Number : Alphanumeric
An Ensembl genome assembly identifier. It should be prefixed with βGCA_β. If you do not have a Accession Number, substitute with βNAβ.
β’ Kingdom : String
A plain-text string that denotes the top-most taxonomic level. Three are supported: βBacteriaβ, βFungiβ, and βVirusβ. When the βtaxonomicβ ordering scheme is selected, Jasper will order all the taxons for a given group within the same region. If you do not have a Kingdom label, substitute with βNAβ.
β’ Taxa Name : String
A plain-text string containing the full name of the taxon. The taxa name string should contain at least three names: βGenusβ, βSpeciesβ, and βStrainβ. Jasper will parse this field and use the first token as the genus identifier, the second token as the species identifier, and the remaining tokens as the strain names.
β’ Abundance : Floating-Point
A floating-point scalar value that represents the relative abundance of the given taxon.
β’ Condition Label : String
A user-defined plain-text label that defines a biological condition, or biological interpretation. Jasper will group taxons around these labels and then order them using a taxonomic ordering within each region.
Example Profiles
To get you started with formatting your profiles, you can download these examples.
By fixing the orderings of the taxa, a microbiome map can be used to present groups of metagenomic samples that can be partitioned temporally (longitudinal studies), spatially (body or environmental sites), by disease type (and subtype), by disease stage, and by developmental stages.
Additionally, it is readily possible to create average maps, aggregate maps, and differential maps showing either average, aggregate, or differential abundances, respectively.
Differential Abundances
To visualize differential abundances, you will need to format the profile so that it does not reflect the single abundance of a single sample, but rather, the processed results of a differential analysis of many samples in multiple biological conditions. How to analyze microbiome data for a differential abundance analysis is outside the scope of this manual, but you the paper from Quinn et al., βA Field Guide for the Compositional Analysis of Any-Omics Dataβ is a good place to start:
Visualizing a compositional analysis of two conditions using a microbiome map would require that the input profile represent the taxons that are found to be differentially expressed, and the abundance field in the input format would not be a raw abundance value, but rather, the clr-transformed ratios of the sample, or the adjusted p-value of a statistical test. The format would be the same as before.
Microbiome Maps
We use a technique called the Hilbert curve visualization (HCV) to visualize the microbial community abundance profiles of a large number of genomes. These profiles contain the relative abundance measurements of thousands of genomes, and they are ordered along a space-filling curve in a 2D square using the Hilbert curve, making it possible to visualize the profile of a single metagenomic sample. In the resulting Hilbert image, each position is a genome from the reference database, and the intensity of the position's color value represents the abundance of a genome in the sample.
Depending on the ordering of the genomes that is selected, different microbial neighborhoods are created, allowing for different interpretations of the clusters of bright segments, i.e., hotspots, of abundant genomes in the images. Fixing the position of a genome results in visualizations that allow for quick comparisons of the abundance of the same genome or sets of genomes in multiple microbiome samples.
The color intensity of each position in the image represents the abundance of one microbial genome. Groups of segments are labeled by the common taxonomic groups induced by the ordering of the taxa with the Hilbert curve.
Microbial Neighborhoods
Different linear orderings produce different Hilbert visualizations, with each resulting in clusters of related microbes along neighboring regions in the 2D plane. The clustering creates unique areas that resemble community neighborhoods, and they represent microbes belonging to either the same taxonomic group, or the same biological condition β the idea being that they are clustering around a common scheme (taxonomic or biological).
Multiple 1D linear orderings can exists, but our current version of the software uses two: 1) a taxonomic order, and 2) a user-defined biological condition ordering. Ordering genomes using a taxonomic order is based on a genome's taxonomic lineage. In this ordering, pairs of taxa belonging to the same taxonomic group are placed close to each other along the curve, and consequently, close to each other in the Hilbert image. Ordering genomes using a user-defined order allows users to define their own orderings based on their experimental conditions.