Single Cell RNA-Seq Trajectory Inference Analysis
Introduction
This tool implements Monocle3, a scRNA-Seq data analysis toolkit developed by Trapnell lab. The minimal input required is raw counts (also known as expression count tables) and experimental design files (also known as cell metadata) in a text file separated by a tab. After performing the trajectory inference, each cell is assigned a pseudotime value which can then be grouped according to the range and forced to perform differential expression analysis. This allows the discovery of differential genes responsible for cell fate decisions.
This tool is based on the R package Monocle3. Please cite Monocle3 as:
To open Monocle3 Trajectory Inference Wizard: Transcriptomics Module → Single Cell RNA-Seq → Trajectory Inference
Input
The input data necessary for trajectory inference are a count table and an experimental design file (or cell metadata). The count table is also called the expression count file and contains information about gene/feature expression values across cells/Samples. The count table can be supplied as a simple tab-separated text file or an OmicsBox count table project.
Count Table File: The count table file must be supplied in a tab-separated text file. The possible extensions are (.txt or .tsv). The first column in a count table should contain information about Gene/Feature IDs. Also, the Gene/Feature IDs (first column) should not contain any duplicates. Successive columns contain information about the cells/ samples and their corresponding gene counts.
Count Table Project: The count table project file can be obtained after performing reference alignment with STAR or RSEM. The Count Table Project file is nothing but an OmicxBox count table, usually stored as a box object with a (.box) extension.
Experimental Design File: The experimental design file contains information about the cells/ Samples of the experiment. The first column of the experimental design file should contain information about the cells/ samples. Also, the cells/ Samples IDs (first column) should not contain duplicates. The possible extensions are (.txt or .tsv) in a tab-separated format.
Please Supply Syntactically Correct Names
Monocle3 is written in R and requires syntactically correct naming. Therefore, please refrain from supplying illegal characters in the above files. Visit the last section of the manual to read about syntactically correct naming.

Figure 1. Input Data Page
Configuration 1. Data Pre-processing
Normalization Method: The purpose of normalization is to remove the non-biological variation as much as possible. Currently, this option offers log-normalization and size-factor normalization. Log normalization is a method for standardizing your data that can be useful when you have a particular column with high variance. The size factor for each cell represents the estimate of the relative bias in that cell, so dividing its counts by its size factor should remove that bias and give size-factor normalized counts. The user can also skip the normalization by selecting "none".
PCA: Principal component analysis (PCA) is a classic dimension reduction approach. It constructs linear combinations of gene expressions called principal components (PCs). The PCs are orthogonal to each other, can effectively explain the variation of gene expressions, and may have a much lower dimensionality.
Dimensions: Number of dimensions after performing PCA; it is recommended that 50 top principal components for datasets with more than 5,000 cells should be selected.
Scaling: Scaling the data makes it easy for a model to learn and understand the problem. If the variables are in different units, it’s always a good idea to scale before calculating principal components.
Gene Subset: A subset of genes from the experiment can be supplied to perform PCA. The file should contain a single column with all the gene/feature IDs.
Gene Annotations: The gene annotations are helpful while performing downstream analysis. The annotations usually contain information about the gene symbol, location, etc. Users can supply all of this information in a text file having each column separated by a tab delimiter. If this information is supplied, the file should contain a column by the name "gene_short_name", which should hold either the gene IDs or gene symbols (required by Monocle3).
Please Supply Syntactically Correct Names
Monocle3 is written in R and requires syntactically correct naming. Therefore, please refrain from supplying illegal characters in the above files. Visit the last section of the manual to read about syntactically correct naming.

Figure 2. Data Pre-processing Page
Configuration 2. Parameter Tuning
Batch Correction: Batch effect correction removes variability from the data that is not due to variables of interest. Batch effects are due to technical differences between your samples, such as the type of sequencing machine or even the technician that ran the sample.
Remove Batch effects: Check yes if you want to remove batch effects.
Column with batch information: The experimental design file should have a column containing information about the batches. Select the column having batch information using the drop-down list.
Mutual Nearest Neighbor: If a pair of cells from each batch is contained in each other's set of nearest neighbors, those cells are considered mutual nearest neighbors. Adjust the number of mutual nearest neighbors for batch alignment.
UMAP: Uniform Manifold Approximation and Projection (UMAP) is a dimension reduction technique that can be used for visualization similarly to t-SNE, but also for general non-linear dimension reduction.
Minimum Distance: Controls how tightly UMAP clumps cells together, with low values leading to more tightly packed cells. Larger values will make UMAP pack cells together more loosely, focusing instead on the preservation of the broad topological structure.
Neighbors: Controls how UMAP balances local versus global structure - low values will push UMAP to focus more on the local structure by constraining the number of neighboring points considered when analyzing the data in high dimensions. In contrast, high values will push UMAP towards representing the big-picture structure while losing fine detail.
Clustering
Nearest Neighbors: Number of nearest neighbors for the Nearest Neighbours algorithm.
Allow disjoint graph: Enabling this will allow the joining of different partitions into one trajectory.
Allow loops: Enabling this will allow the discovery of cyclic trajectories if present in the data.

Figure 3. Parameter Tuning Page
Configuration 3. Trajectory Inference
Select Start point: To work, Monocle3 needs a root node or starting point, which is used as the reference point for the trajectory construction. OmicsBox allows two ways to provide this information, i.e., with a list of progenitor cells or by selecting a column.
Experimental Design File
Select Column: The experimental design file contains information about the cells in the count table. The column has information about the experimental time (real-time) or cell type (progenitors or precursor cells), which should be selected in this step.
Starting Point: Once the column having information about the potential starting points is selected, the starting point has to be decided; in the case of experimental time, it has to be the first capture time (0h) or similar; Likewise, it could be the stem cell-like hematopoietic stem cell.
Select a list of progenitor cells: This option allows the submission of progenitor cells, which are used as starting points for the trajectory. This list should be supplied as a text file (.tsv or .txt) with cell names (one per line).
Please Supply Syntactically Correct Names
Monocle3 is written in R and requires syntactically correct naming. Therefore, please refrain from supplying illegal characters in the above files. Visit the last section of the manual to read about syntactically correct naming.

Figure 4. Trajectory Inference
Side Panel Options
Once the analysis is complete and results are generated, the side-panel actions are visible. Currently, the following side panel options are available.
Summary Report
Extract Cluster Count
Differential Expression
UMAP
Bar Plot
Export Table
Actions: Summary Report
Generates Summary Report

Actions: Extract Cluster Counts
Extract Ranges: The clusters are nothing but the ranges for the pseudotime. The pseudotime ranges of interest can be selected using this option, and the corresponding cells and their gene/ feature counts will be extracted accordingly.
Export Raw Counts: It requires a directory and the file name where the raw counts are extracted. Please note that the raw counts are extracted for selected pseudotime ranges in a tab-separated table format.
Actions: Differential Expression
Please follow the single-cell differential expression tutorial. Pseudotime range labels replace the Cluster labels.

Charts: UMAP
Select Colour Style: Using a drop, the coloring style for the UMAP can be changed. OmicsBox offers three styles, i.e., color by cluster, color by pseudotime, and color by partition.

Export Table
Export Table File: Select a file for extracting the main results. The table will be extracted in a tab-separated format in a text file.
Include Header: Check if the table header is required in the file.

Output
Monocle3 in OmcisBox returns one main output table and two important plots in accordance with a standard trajectory analysis. A brief report is also generated per run showing information about the parameters and returned results.
Main output table with pseudotime information
Trajectory UMAP
Distribution of cells over pseudotime ranges
Results (Main output table)
Cell: Names of the cells supplied in the count table and experimental design file.
Pseudotime: Pseudotime of the cells allotted via Monocle3. If some cells are not allotted pseudotime, no value is provided.
Pseudotime Range: Pseudotime range of (clusters of pseudotime). If some cells are not allotted pseudotime, then it is explicitly mentioned. The values of ranges are left-closed (right-open) intervals.
Cluster: Assigned clusters
Partition: Assigned super-cluster or partition

Results (Trajectory UMAP)
Trajectory UMAP is essentially a umap colored using the continuum of pseudotime and has a line graph superimposed on it, which depicts the global structure of progression among cells. The pseudotime slider can be adjusted to visualize the cells belonging to some specific pseudotime range. In case no pseudo time is allotted, only a graphical line is shown, but no colored cells are shown.

Results (Distribution of cells across pseudotime range)
The plot shows the distribution of the number of cells per pseudotime range. This can be correlated with cell type annotations to elucidate the progenitor cell types or intermediate cell states, as they tend to have lower pseudotime values. The values of ranges are left-closed (right-open) intervals.

Syntactical Correct Naming
Please refer to the following table while pre-pairing the data for analysis. Following rules will make analysis more robust and will enable integration with other tools (both by BioBam and Open Source (R, Python, Excel)).
Incorrect Naming | Correct Naming | |
---|---|---|
Spaces | Embryo Time | Embryo.Time or Embryo_Time |
Quotes | Embryo'Time | Embryo.Time or Embryo_Time |
Mathematical Operators | Embryo+Time | Embryo.Time or Embryo_Time |
Backslash | Embryo\Time | Embryo.Time or Embryo_Time |
Preceded by a numeric | 2Embryo | Embryo2 or Embryo.2 or Embryo_2 |
Symbols ($, @, # etc) | Embryo$Time or $EmbryoTime | Embryo.Time or Embryo_Time |
Web links | Not supported |