Summary:
Single cell gene expression profiling. Shift towards understanding similarities and differences between individual cells at the transcriptional and translational level. Gene expression levels may be distorted by sampling effects (copying and amplying mRNA pool). Particularly problematic for low copy number transcripts in single cell samples (random dropouts of low abundance transcripts from amplified single cell cDNA populations). Magnitude of distortion will also depend on transcript abundance distribution. High number of genes with transcript abundances lower then 10-20 and relatively few genes with high transcript. Study finds that the majority (44%) of genes are represented by limited number of mRNA copies (less 25), and this may account for the large cell-to-cell variations in mRNA copy number that we have observed. They also conclude that sampling effects do not impede our ability to extract reliable gene expression profiles from single cells and that significant differences in gene expression levels exist between phenotypically identical cells. (Published: 03/06/08)
Notes:
- Single-cell gene expression profiling provides a powerful tool to analyze the composition of complex cell populations
- many contexts in which the focus is shifting towards understanding the cellular networks of individual cells and the similarities and differences between individual cells at the transcriptional and translational level
- Limitations to the sensitivity and resolution of current technologies for studying gene expression mean that when using samples as small as those generated from single cells we are inevitably faced with amplifying cellular mRNA.
- amplification stage may introduce significant distortions in the measured gene expression levels
- especially for genes with small numbers of transcripts in the material under study
- this distortion is introduced by sampling effects that arise from inefficiencies in the processes of copying and amplifying the original mRNA pool.
- In a complex mRNA population with small absolute numbers of individual transcripts, such as that from a single eukaryotic cell, sampling effects can result in only a subset of the population of starting RNA molecules being represented in the final amplified population
- particularly problematic for low copy number transcripts in single cell samples:
- in the first step of the process, reverse transcription may fail for a small proportion of the original mRNA molecules
- eliminated from subsequent amplification and detection
- For genes with only a small number of transcripts in the starting material, this will create a variable (assuming the failures are random) distortion in the relative representation of transcript abundances in the final experimental sample
- potentially leading to the absence of such low abundance transcripts in the final amplified population.
- first round of PCR amplification will have a similar effect, and subsequent rounds will have effects of diminishing importance, in terms of complete dropout of lowabundance transcripts
- overall effect of random dropouts of low abundance transcripts from amplified single cell cDNA populations would be that random sets of transcripts would be called as absent in different cells
- one estimate is that there is a lower limit of 80 copies of a single mRNA per cell for detection of two-fold differences between samples
- magnitude of the overall sampling effect will, in theory, depend on two factors:
- the transcript abundance distribution which is the variation of transcript number among genes being expressed in a cell (and in particular the relative numbers of genes with low transcript numbers);
- and the copying and amplification efficiencies for conversion of the original population of mRNA molecules into DNA or RNA detectable by the expression profiling platform in use
- The copying and amplification efficiencies can be estimated from experimental data. However, the estimation of the transcript abundance distribution poses two distinct problems: knowing the form of the distribution; and evaluating the shape and scale parameters for the distribution.
- conflicting reports of the transcript abundance distribution in a typical eukaryotic cell
- ranging from a distribution with a median value for mRNA transcript copies per gene of less then one
- to a distribution with a median of approximately 100 copies
- difficulty is that, in general, the transcript abundance distributions of real single cells are not known but are inferred from population measurements
- Based on published data, a simple approximation is that the transcript abundance distribution is log-log-normal, as this distribution captures certain key features of our current understanding of the single cell transcript abundance distribution:
- there is a high number of genes with transcript abundances lower then 10-20 and relatively few genes with high transcript
Discussion
- The main findings of this study are that the contribution of sampling effects to observed single cell expression data is likely to be minor and that substantial transcriptional differences exist between phenotypically identical cells.
- indicates that one can generate reliable gene expression profiles from single cells using microarrays to interrogate globally amplified RNA populations
- However, the considerable variation in gene expression levels between similar cells is likely to dictate that relatively high numbers of cells would need to be analysed to robustly identify significant and consistent differences in gene expression between cell populations.
- Alternatively, these findings argue that single cell expression profiling will be particularly useful for identifying absolute differences in gene expression between cell types.
- A second implication of this study is that one important limit on the use of amplification techniques for single cell expression profiling is that if amplification efficiency drops significantly below 90% then the sampling effect may considerably distort the measured expression profile
- One promising technique for mRNA amplification from individual cells, which combines global exponential and linear amplification, has been shown to produce very low levels of noise and highly reproducible data and may limit the significance of sampling effects when profiling rare transcripts [22].
- Our results demonstrate that the actual transcript abundance distribution for the tested cell type has a peak at approximately 5-20 copies per gene.
- We recognize that our experiments are based on a particular type of mouse neural stem cell, but in the absence of any reason to suppose that the transcript distributions of most other cell types are radically different from this, we believe the result should generally apply to expression experiments performed on a wide range of cell types.
- Although our method did not allow us to discriminate between different models of overall gene and transcript numbers in the cell, we believe it strongly suggests that more then 85% of transcripts are present in relatively low copy numbers (less then 100 copies per cell).
- Insight into the variability of the gene expression profiles of single cells has been obtained using a number of technical approaches, incuding microarray analysis following linear T7-based amplification [16, 25], multiplexed FISH (fluorescence in situ hybridization) [26] and quantitative PCR [27].
- Transcriptional bursting has been observed in Escherichia coli, in which protein levels have very little correlation with mRNA levels, particularly for younger cells [28], as well as Dictyostelium [29] and mammalian cells [30].
- Overall, those findings are consistent with a model for cellular phenotypes that are underwritten by transcriptional programs that appear inherently noisy when total cellular transcript levels are measured at the single cell level.
- It has been suggested that because in the individual cell the transcriptional machinery is controlled by a relatively small number of transcription factors, it may result in stochastic behavior in gene activity.
Conclusions
- Our current results revealed that the majority (44%) of genes are represented by limited number of mRNA copies (less 25), and this may account for the large cell-to-cell variations in mRNA copy number that we have observed.
- also concluded that sampling effects do not impede our ability to extract reliable gene expression profiles from single cells and that significant differences in gene expression levels exist between phenotypically identical cells