Long Reads Transcriptome Analysis with SQANTI3
SQANTI3 is a bioinformatics tool designed for the quality control and filtering of full-length transcripts sequenced with PacBio’s long-read technology. It is designed as the next step of the IsoSeq pipeline. The interest in this tool comes from the usefulness of long-read transcriptome sequencing to describe eukaryotic transcriptomes and replace the use of second-generation sequencing. Illumina short-reads cannot contain a whole transcript and are not able to well-characterize eukaryotic transcriptomes.
Consensus transcripts obtained after using IsoSeq3 in OmicsBox in FASTA format. This FASTA file contains the full-length transcriptome of COLO829T melanoma cell line obtained with long-read sequencing.
Organism: Homo sapiens
Layout: PacBio Single Molecule, Real-Time (SMRT) Sequencing
Tseng, E., Galvin, B., Hon, T., Kloosterman, W. P., & Ashby, M. (2019). Full length transcriptome sequencing of melanoma cell line complements long read sequencing assessment of genomic rearrangements.
Transcriptome sequencing has proven to be an important tool for understanding the biological changes in cancer genomes including the consequences of structural rearrangements. Short-read sequencing has been the method of choice, as the high throughput at low cost allows for transcript quantitation and the detection of even rare transcripts. However, the reads are generally too short to reconstruct complete isoforms. Conversely, long-read sequencing can provide unambiguous full-length isoforms, but lower throughput has complicated quantitation and high RNA input requirements has made working with cancer samples challenging.
Recently, the COLO 829 cell line was sequenced to 50-fold coverage with PacBio Single Molecule, Real-Time (SMRT) Sequencing. To validate and extend the findings from this effort, we have generated long-read transcriptome data using an updated PacBio Iso-Seq method, the results of which will be shared at the AACR 2019 General Meeting. With this complimentary transcriptome data, we demonstrate how recent innovations in the PacBio Iso-Seq method sample preparation and sequencing chemistry have made long-read sequencing of cancer transcriptomes more practical. In particular, library preparation has been simplified and throughput has increased. The improved protocol has reduced sample prep time from several days to one day while reducing the sample input requirements ten-fold. In addition, the incorporation of unique molecular identifier (UMI) tags into the workflow has improved the bioinformatics analysis. Yield has also increased, with 3.0 sequencing chemistry typically delivering >30 Gb per SMRT Cell 1M. By integrating long and short read data, we demonstrate that the Iso-Seq method is a practical tool for annotating cancer genomes with high-quality transcript information.
Unprocessed long-read data can be obtained from:
Nevertheless, SQANTI3 has as input IsoSeq output, that can be downloaded from this link.
1- Analysis Step
File with PacBio HQ Long Reads:
Transcription Start Site Annotation File:
File with PolyA Motifs:
Quality Control Parameters
Ignore Transcript ID Nomenclature: False
Min. Length of Reference Transcript: 200
Skip ORF Prediction: False
Set of Splice Sites: ATAC,GCAG,GTAG
Adenine Percentage: 0.6
Adenines in a Row: 6
Distance to Annotated TTS: 50
Minimum Short-Read Coverage: 3
Filter Mono Exonic Transcripts: False
90 minutes aprox.
example_dataset.box: classification table with a sidebar to make a summary report and different charts.
example.dataset_classification.txt: file with all the information that SQANTI3 can return for each isoform.
example.dataset_junctions.txt: file with information at splice-junction level.
example.dataset_isoforms.fasta: FASTA file with the curated transcriptome.
example.dataset_transcriptome.gtf: annotation file of the curated transcriptome.
example.dataset_isoforms_aminoacids.faa: FASTA file with the translated and curated transcriptome.
The long-read transcriptomics submodule allows the user to use as input the subreads or CSS BAM files from PacBio sequencing, transform them into consensus transcripts and have an analysis and quality control of the generated transcriptome.