scRNA-seq Trajectory Inference

Introduction

This dataset contains Single-cell RNA sequencing data of Caenorhabditis elegans whole embryos. The dataset has been published by Packer & Zhu et al. The study involves the time series analysis of whole developing embryos of C. elegans to identify the temporal lineage of cell development. The tutorial has been adapted from Monocle3 original tutorial by Trapnell Lab. We will use a small subset of 4,591 cells for this analysis.

Dataset description

  • Organism: Caenorhabditis elegans

  • Instrument: Illumina NextSeq 500

  • Library construction: 10x Genomics v2 3' chemistry

  • Number of cells: 4,591

Publication

Packer JS, Zhu Q, Huynh C, Sivaramakrishnan P, Preston E, Dueck H, Stefanik D, Tan K, Trapnell C, Kim J, Waterston RH, Murray JI. A lineage-resolved molecular atlas of C. elegans embryogenesis at single-cell resolution. Science. 2019 Sep 20;365(6459):eaax1971. DOI: 10.1126/science.aax1971. Epub 2019 Sep 5. PMID: 31488706; PMCID: PMC7428862.

 Abstract

Caenorhabditis elegans is an animal with few cells but a wide diversity of cell types. In this study, we characterize the molecular basis for their specification by profiling the transcriptomes of 86,024 single embryonic cells. We identify 502 terminal and preterminal cell types, mapping most single-cell transcriptomes to their exact position in C. elegans' invariant lineage. Using these annotations, we find that (i) the correlation between a cell's lineage and its transcriptome increases from middle to late gastrulation, then falls substantially as cells in the nervous system and pharynx adopt their terminal fates; (ii) multilineage priming contributes to the differentiation of sister cells at dozens of lineage branches; and (iii) most distinct lineages that produce the same anatomical cell type converge to a homogenous transcriptomic state.

Original Data From Tutorial

Original Tutorial

Constructing single-cell trajectories

Bioinformatics Analysis

scRNA-Seq Trajectory Inference

Application

scRNA-Seq Trajectory Inference

Input

Since the original datasets are available in RDS format, we have pre-processed them into individual tables, which are stored in text files in a tab-separated format. Please download the files before following the analysis.

  1. Waterson_counts.tsv (Download) or (Download a box file)This file has raw counts; the first column doesn’t have a heading and contains the gene/feature IDs (in our case, we have gene IDs from WormBase because this dataset is from C. elegans). Each column has a heading which is a cell name identifier (unique) and corresponds to the cells in the experimental design file.

  2. Waterson_cell_metadata.tsv (Download): This file has attributes related to the experiment, like the number of UMIs, batch, and collection time points. The first column of this file stores information about the cell name identifier (unique) and should correspond to the cells in the count table (i.e., waterson_counts.tsv).

  3. Waterson_gene_annotations.tsv (Download): This file has information about the gene IDs (in our case, they are from WormBase). The first column of this file has the gene IDs that correspond to the genes in the count table. It is also worth noting that if this file is supplied, it should contain a column having the heading “gene_short_name“ (it can contain gene symbols or user-defined short names), as is required by the Monocle3 algorithm.

  4. Alternatively: In this particular tutorial, we will use Waterson_cell_metadata.tsv for root node selection, but progenitor root cells can also be supplied. Click to download the list of progenitor cells. This file has the names of the progenitor cells, one per line.

Parameters

  • Count table file: Select “waterson_counts.tsv“

  • Experimental Design File: Select “waterson_cell_metadata.tsv“

  • Normalization Method: Log Normalization

  • PCA-Dimensions: 50

  • PCA-Scaling: Checked

  • Add Gene Subset File: Unchecked

  • Add optional Gene Information: Checked

  • Gene Information: Select “waterson_gene_annotations.tsv“

  • Remove batch effects: Checked

  • Batch-Column with batch information: Choose “batch“

  • Mutual Nearest Neighbor: 20

  • UMAP-Min. Distance: 0.1

  • UMAP- Neighbours: 15

  • Clustering-Nearest Neighbor: 20

  • Disjoint Graphs: Unchecked

  • Loops: Unchecked

  • Select Starting Point: Choose Experimental Design File

  • Select Column: Choose “raw.embryo.time.bin“

  • Starting Point: Choose “170-210“

Execution Time

5 Minutes

Output

Interpretation of the Results

In this section, we will walk through some important figures' and interpret whether the results are intuitively correct or not.

Distribution of Cells in Pseudotime

The distribution of cells across pseudotime is helpful while interpreting the results.

Trajectory UMAP

In a trajectory umap, each dot represents a cell. The color scheme is according to the assigned pseudotime. In Figure 5, the cells residing earlier in the trajectory have been colored in orange, and the cells which come later are colored in green. Also, there is a line that describes the global structure of the trajectory.