With the technological advancement and low-cost sequencing, Single Cell Sequencing is getting wide application among the research scientist. In this document we will discuss various aspects of single cell sequencing data, particularly focusing on data analysis.
Isolating individual cells
Obtaining the required amount of RNA
High amplification can affect downstream analysis
Working on with the high dimensional data.
Many of the lab challenges has been overcome by commercial service providers. However, Data Analysis is still the biggest bottlenecks in the successful adoption of the techniques. With the wide variability in the cells, it is often challenging to differentiate signal to noise. Over amplification of RNA can lead to masking of true differences over the artifacts.
Quality Control: The raw data needs to be checked for quality. The chances of low quality in the single cell RNA Sequencing is far higher than the traditional RNA-SEQ. Reads might be removed or trimmed before further analysis. Quiet often we need to get away with cells with poor sequencing quality. Though many software is available we would recommend FASTQC to check for quality and FASTX to trim/remove the reads.
Alignment: Alignment is petty straight forward, and better option would be to try few aligners and see which works best rather than relying on published results. Quiet often we are self-surprised how different the results from different aligners can be. Some of the aligners to try would be STAR, BWA, Bowtie Our recommendation would be to use BWA-Mem algorithm.
- Sample Filtration: The cells with low number of reads or low alignment can be removed as it may interfere with downstream analysis.
- Normalization: Many Normalization methods has been developed though the choice remains critical considering the variability in the single cell protocols and lab methods. It is best to work with few Normalization techniques and see what works best for the data. Some methods specific for single cell RNA Sequencing are scNORM, bayNORM. One should even try it with basic scaling and RPMK and see how the results vary.
- Feature Counts: This is an important step in the analysis. Analysis will require a GTF file which basically provides a genome annotation. One can use the traditional GTF or GENCODE GTF file. Using traditional GTF streamlines the analysis as we are working only with the well-established annotation. GENCODE GTF allow dive deep in the uncharted territory and unless one have good amount of Bioinformatics resources, we won’t recommend it. Some of the software’s for creating count matrix are Cufflinks and Subread.
- Downstream Analysis: Since the number of data point (cells) are very high traditional clustering analysis will not work out. Principal component analysis followed by Clustering seems to be an ideal choice to group sample and genes. These clusters can them be analysed for overrepresented Ontologies, Pathways and Networks.
Network Analysis: Cytoscape
- cSNP and eQTL: Variations among the cells can be studied with reference to expression.
- Fusion Genes: Fusion Genes can be estimated from single cell RNA Sequencing
- Alternate Splicing: Alternate Splicing events can be studied.
- Novel Genes and Exons: Identify Novel Genes and Exons
Contact us if you are planning a Single Cell Sequencing Experiment