RESUMO
Results of massive parallel sequencing-by-synthesis vary depending on the sequencing approach. CoolMPS™ is a new sequencing chemistry that incorporates bases by labeled antibodies. To evaluate the performance, we sequenced 240 human non-coding RNA samples (dementia patients and controls) with and without CoolMPS. The Q30 value as indicator of the per base sequencing quality increased from 91.8 to 94%. The higher quality was reached across the whole read length. Likewise, the percentage of reads mapping to the human genome increased from 84.9 to 86.2%. For both technologies, we computed similar distributions between different RNA classes (miRNA, piRNA, tRNA, snoRNA and yRNA) and within the classes. While standard sequencing-by-synthesis allowed to recover more annotated miRNAs, CoolMPS yielded more novel miRNAs. The correlation between the two methods was 0.97. Evaluating the diagnostic performance, we observed lower minimal P-values for CoolMPS (adjusted P-value of 0.0006 versus 0.0004) and larger effect sizes (Cohen's d of 0.878 versus 0.9). Validating 19 miRNAs resulted in a correlation of 0.852 between CoolMPS and reverse transcriptase-quantitative polymerase chain reaction. Comparison to data generated with Illumina technology confirmed a known shift in the overall RNA composition. With CoolMPS we evaluated a novel sequencing-by-synthesis technology showing high performance for the analysis of non-coding RNAs.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA não Traduzido/química , Análise de Sequência de RNA/métodos , Especificidade de Anticorpos , Biomarcadores , Biologia Computacional , DNA Complementar/genética , Bases de Dados Genéticas , Conjuntos de Dados como Assunto , Demência/sangue , Demência/genética , Técnica Direta de Fluorescência para Anticorpo , Biblioteca Gênica , Humanos , Biópsia Líquida , MicroRNAs/química , MicroRNAs/genética , Nucleotídeos/imunologia , RNA não Traduzido/síntese química , RNA não Traduzido/genética , Reprodutibilidade dos Testes , Reação em Cadeia da Polimerase Via Transcriptase ReversaRESUMO
Here, we describe single-tube long fragment read (stLFR), a technology that enables sequencing of data from long DNA molecules using economical second-generation sequencing technology. It is based on adding the same barcode sequence to subfragments of the original long DNA molecule (DNA cobarcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process, up to 3.6 billion unique barcode sequences were generated on beads, enabling practically nonredundant cobarcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique cobarcoding of more than 8 million 20- to 300-kb genomic DNA fragments. Analysis of the human genome NA12878 with stLFR demonstrated high-quality variant calling and phase block lengths up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries, and their construction did not significantly add to the time or cost of whole-genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high-quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento Completo do Genoma/métodos , Análise Custo-Benefício , Diploide , Biblioteca Gênica , Genoma Humano , Genômica , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/economia , Humanos , Sequenciamento Completo do Genoma/economiaRESUMO
To fully understand human genetic variation and its functional consequences, the specific distribution of variants between the two chromosomal homologues of genes must be known. The 'phase' of variants can significantly impact gene function and phenotype. To assess patterns of phase at large scale, we have analyzed 18 121 autosomal genes in 1092 statistically phased genomes from the 1000 Genomes Project and 184 experimentally phased genomes from the Personal Genome Project. Here we show that genes with cis-configurations of coding variants are more frequent than genes with trans-configurations in a genome, with global cis/trans ratios of â¼60:40. Significant cis-abundance was observed in virtually all genomes in all populations. Moreover, we identified a large group of genes exhibiting cis-configurations of protein-changing variants in excess, so-called 'cis-abundant genes', and a smaller group of 'trans-abundant genes'. These two gene categories were functionally distinguishable, and exhibited strikingly different distributional patterns of protein-changing variants. Underlying these phenomena was a shared set of phase-sensitive genes of importance for adaptation and evolution. This work establishes common patterns of phase as key characteristics of diploid human exomes and provides evidence for their functional significance, highlighting the importance of phase for the interpretation of protein-coding genetic variation and gene function.
Assuntos
Diploide , Genoma Humano/genética , Fases de Leitura Aberta/genética , Locos de Características Quantitativas/genética , Exoma/genética , Variação Genética , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
BACKGROUND: RNA-Seq data is inherently nonuniform for different transcripts because of differences in gene expression. This makes it challenging to decide how much data should be generated from each sample. How much should one spend to recover the less expressed transcripts? The sequencing technology used is another consideration, as there are inevitably always biases against certain sequences. To investigate these effects, we first looked at high-depth libraries from a set of well-annotated organisms to ascertain the impact of sequencing depth on de novo assembly. We then looked at libraries sequenced from the Universal Human Reference RNA (UHRR) to compare the performance of Illumina HiSeq and MGI DNBseq™ technologies. RESULTS: On the issue of sequencing depth, the amount of exomic sequence assembled plateaued using data sets of approximately 2 to 8 Gbp. However, the amount of genomic sequence assembled did not plateau for many of the analyzed organisms. Most of the unannotated genomic sequences are single-exon transcripts whose biological significance will be questionable for some users. On the issue of sequencing technology, both of the analyzed platforms recovered a similar number of full-length transcripts. The missing "gap" regions in the HiSeq assemblies were often attributed to higher GC contents, but this may be an artefact of library preparation and not of sequencing technology. CONCLUSIONS: Increasing sequencing depth beyond modest data sets of less than 10 Gbp recovers a plethora of single-exon transcripts undocumented in genome annotations. DNBseq™ is a viable alternative to HiSeq for de novo RNA-Seq assembly.
Assuntos
RNA-Seq/métodos , Animais , Arabidopsis , Éxons , Biblioteca Gênica , Humanos , Anotação de Sequência Molecular , Fases de Leitura Aberta , OryzaRESUMO
BACKGROUND: Massively-parallel-sequencing, coupled with sample multiplexing, has made genetic tests broadly affordable. However, intractable index mis-assignments (commonly exceeds 1%) were repeatedly reported on some widely used sequencing platforms. RESULTS: Here, we investigated this quality issue on BGI sequencers using three library preparation methods: whole genome sequencing (WGS) with PCR, PCR-free WGS, and two-step targeted PCR. BGI's sequencers utilize a unique DNA nanoball (DNB) technology which uses rolling circle replication for DNA-nanoball preparation; this linear amplification is PCR free and can avoid error accumulation. We demonstrated that single index mis-assignment from free indexed oligos occurs at a rate of one in 36 million reads, suggesting virtually no index hopping during DNB creation and arraying. Furthermore, the DNB-based NGS libraries have achieved an unprecedentedly low sample-to-sample mis-assignment rate of 0.0001 to 0.0004% under recommended procedures. CONCLUSIONS: Single indexing with DNB technology provides a simple but effective method for sensitive genetic assays with large sample numbers.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Bactérias/genética , Humanos , Sequenciamento Completo do Genoma , Fluxo de TrabalhoRESUMO
Currently, the methods available for preimplantation genetic diagnosis (PGD) of in vitro fertilized (IVF) embryos do not detect de novo single-nucleotide and short indel mutations, which have been shown to cause a large fraction of genetic diseases. Detection of all these types of mutations requires whole-genome sequencing (WGS). In this study, advanced massively parallel WGS was performed on three 5- to 10-cell biopsies from two blastocyst-stage embryos. Both parents and paternal grandparents were also analyzed to allow for accurate measurements of false-positive and false-negative error rates. Overall, >95% of each genome was called. In the embryos, experimentally derived haplotypes and barcoded read data were used to detect and phase up to 82% of de novo single base mutations with a false-positive rate of about one error per Gb, resulting in fewer than 10 such errors per embryo. This represents a â¼ 100-fold lower error rate than previously published from 10 cells, and it is the first demonstration that advanced WGS can be used to accurately identify these de novo mutations in spite of the thousands of false-positive errors introduced by the extensive DNA amplification required for deep sequencing. Using haplotype information, we also demonstrate how small de novo deletions could be detected. These results suggest that phased WGS using barcoded DNA could be used in the future as part of the PGD process to maximize comprehensiveness in detecting disease-causing mutations and to reduce the incidence of genetic diseases.
Assuntos
Embrião de Mamíferos , Fertilização in vitro , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Mutação Puntual , Blastocisto/metabolismo , Éxons , Haplótipos , Heterozigoto , Humanos , Polimorfismo de Nucleotídeo Único , Deleção de SequênciaRESUMO
PurposeWe describe a novel syndrome in seven female patients with extreme developmental delay and neoteny.MethodsAll patients in this study were female, aged 4 to 23 years, were well below the fifth percentile in height and weight, had failed to develop sexually, and lacked the use of language. Karyotype and array chromosome genomic hybridization analysis failed to identify large-scale structural variations. To further understand the underlying cause of disease in these patients, whole-genome sequencing was performed.ResultsIn five patients, coding de novo mutations (DNMs) were found in five different genes. These genes fell into similar functional categories of transcription regulation and chromatin modification. Comparison to a control population suggested that individuals with neotenic complex syndrome (NCS)-a name that we propose herein-could have an excess of rare inherited variants in genes associated with developmental delay and autism, although the difference was not significant.ConclusionWe describe an extreme form of developmental delay, with the defining characteristic of neoteny. In most patients we identified coding DNMs in a set of genes intolerant of haploinsufficiency; however, it is not clear whether these contributed to NCS. Rare inherited variants may also be associated with NCS, but more samples need to be analyzed to achieve statistical significance.
Assuntos
Anormalidades Múltiplas/diagnóstico , Anormalidades Múltiplas/genética , Estudos de Associação Genética , Predisposição Genética para Doença , Testes Genéticos , Fenótipo , Adolescente , Adulto , Alelos , Substituição de Aminoácidos , Criança , Pré-Escolar , Fácies , Feminino , Frequência do Gene , Testes Genéticos/métodos , Genótipo , Humanos , Masculino , Síndrome , Sequenciamento Completo do Genoma , Adulto JovemRESUMO
BACKGROUND: Amniocentesis is a common procedure, the primary purpose of which is to collect cells from the fetus to allow testing for abnormal chromosomes, altered chromosomal copy number, or a small number of genes that have small single- to multibase defects. Here we demonstrate the feasibility of generating an accurate whole-genome sequence of a fetus from either the cellular or cell-free DNA (cfDNA) of an amniotic sample. METHODS: cfDNA and DNA isolated from the cell pellet of 31 amniocenteses were sequenced to approximately 50× genome coverage by use of the Complete Genomics nanoarray platform. In a subset of the samples, long fragment read libraries were generated from DNA isolated from cells and sequenced to approximately 100× genome coverage. RESULTS: Concordance of variant calls between the 2 DNA sources and with parental libraries was >96%. Two fetal genomes were found to harbor potentially detrimental variants in chromodomain helicase DNA binding protein 8 (CHD8) and LDL receptor-related protein 1 (LRP1), variations of which have been associated with autism spectrum disorder and keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. CONCLUSIONS: We were able to elucidate the complete genome sequence of 31 fetuses from amniotic fluid and demonstrate that the cfDNA or DNA from the cell pellet can be analyzed with little difference in quality. We believe that current technologies could analyze this material in a highly accurate and complete manner and that analyses like these should be considered for addition to current amniocentesis procedures.
Assuntos
Líquido Amniótico/metabolismo , Feto/metabolismo , Genoma Humano , Sequenciamento Completo do Genoma , Anormalidades Múltiplas/genética , Adulto , Amniocentese , Transtorno do Espectro Autista/genética , Estudos de Coortes , Variações do Número de Cópias de DNA , Doença de Darier/genética , Sobrancelhas/anormalidades , Estudos de Viabilidade , Feminino , Predisposição Genética para Doença , Humanos , Masculino , MutaçãoRESUMO
BACKGROUND: Amyotrophic lateral sclerosis (ALS) is a devastating disease whose complex pathology has been associated with a strong genetic component in the context of both familial and sporadic disease. Herein, we adopted a next-generation sequencing approach to Greek patients suffering from sporadic ALS (together with their healthy counterparts) in order to explore further the genetic basis of sporadic ALS (sALS). RESULTS: Whole-genome sequencing analysis of Greek sALS patients revealed a positive association between FTO and TBC1D1 gene variants and sALS. Further, linkage disequilibrium analyses were suggestive of a specific disease-associated haplotype for FTO gene variants. Genotyping for these variants was performed in Greek, Sardinian, and Turkish sALS patients. A lack of association between FTO and TBC1D1 variants and sALS in patients of Sardinian and Turkish descent may suggest a founder effect in the Greek population. FTO was found to be highly expressed in motor neurons, while in silico analyses predicted an impact on FTO and TBC1D1 mRNA splicing for the genomic variants in question. CONCLUSIONS: To our knowledge, this is the first study to present a possible association between FTO gene variants and the genetic etiology of sALS. In addition, the next-generation sequencing-based genomics approach coupled with the two-step validation strategy described herein has the potential to be applied to other types of human complex genetic disorders in order to identify variants of clinical significance.
Assuntos
Dioxigenase FTO Dependente de alfa-Cetoglutarato/genética , Esclerose Lateral Amiotrófica/genética , Dioxigenase FTO Dependente de alfa-Cetoglutarato/metabolismo , Estudos de Casos e Controles , Simulação por Computador , Efeito Fundador , Proteínas Ativadoras de GTPase/genética , Grécia , Haplótipos , Humanos , Desequilíbrio de Ligação , Neurônios Motores/patologia , Neurônios Motores/fisiologia , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Recent advances in whole-genome sequencing have brought the vision of personal genomics and genomic medicine closer to reality. However, current methods lack clinical accuracy and the ability to describe the context (haplotypes) in which genome variants co-occur in a cost-effective manner. Here we describe a low-cost DNA sequencing and haplotyping process, long fragment read (LFR) technology, which is similar to sequencing long single DNA molecules without cloning or separation of metaphase chromosomes. In this study, ten LFR libraries were made using only â¼100 picograms of human DNA per sample. Up to 97% of the heterozygous single nucleotide variants were assembled into long haplotype contigs. Removal of false positive single nucleotide variants not phased by multiple LFR haplotypes resulted in a final genome error rate of 1 in 10 megabases. Cost-effective and accurate genome sequencing and haplotyping from 10-20 human cells, as demonstrated here, will enable comprehensive genetic studies and diverse clinical applications.
Assuntos
Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos , Alelos , Linhagem Celular , Feminino , Inativação Gênica , Variação Genética , Haplótipos , Humanos , Mutação , Reprodutibilidade dos Testes , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/normasRESUMO
BACKGROUND: Celiac disease is a complex chronic immune-mediated disorder of the small intestine. Today, the pathobiology of the disease is unclear, perplexing differential diagnosis, patient stratification, and decision-making in the clinic. METHODS: Herein, we adopted a next-generation sequencing approach in a celiac disease trio of Greek descent to identify all genomic variants with the potential of celiac disease predisposition. RESULTS: Analysis revealed six genomic variants of prime interest: SLC9A4 c.1919G>A, KIAA1109 c.2933T>C and c.4268_4269delCCinsTA, HoxB6 c.668C>A, HoxD12 c.418G>A, and NCK2 c.745_746delAAinsG, from which NCK2 c.745_746delAAinsG is novel. Data validation in pediatric celiac disease patients of Greek (n = 109) and Serbian (n = 73) descent and their healthy counterparts (n = 111 and n = 32, respectively) indicated that HoxD12 c.418G>A is more prevalent in celiac disease patients in the Serbian population (P < 0.01), while NCK2 c.745_746delAAinsG is less prevalent in celiac disease patients rather than healthy individuals of Greek descent (P = 0.03). SLC9A4 c.1919G>A and KIAA1109 c.2933T>C and c.4268_4269delCCinsTA were more abundant in patients; nevertheless, they failed to show statistical significance. CONCLUSIONS: The next-generation sequencing-based family genomics approach described herein may serve as a paradigm towards the identification of novel functional variants with the aim of understanding complex disease pathobiology.
Assuntos
Doença Celíaca/genética , Sítios de Ligação , Criança , Frequência do Gene , Estudos de Associação Genética , Predisposição Genética para Doença , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Moleculares , Mutação , Polimorfismo de Nucleotídeo Único , Fatores de RiscoRESUMO
OBJECTIVE: The purpose of this study were to develop a methodology of isolating fetal cells from maternal blood and use deep sequence demonstrating the promise for complete and accurate genetic screening compared to other non-invasive prenatal testing. METHODS: Here in this study, we developed a double negative selection (DNS) procedure to unbiasedly enrich fetal cells. After validated by short tandem repeat (STR), the isolated circulating fetal cells (CFCs) were subjected to deep whole genome sequencing analysis. RESULTS: Our DNS protocol significantly increasing the purity of the mimic fetal cells from 1 in 1 million nucleated cells in whole blood to 1:8 to 1:30 (12.5%-3.33%) after 2 steps of enrichment. Isolated single fetal cell obtained a coverage rate (86.8%) and allelic dropout rate (24.90%) comparative to the reported results of human cell line. Several disease-associated variants were identified in the whole genome sequencing data of isolated CFCs and further confirmed in the sequencing data of unamplified gDNA. CONCLUSION: In conclusion, the robustness of DNS and STR to collect CFCs from peripheral maternal blood for the first time coupled with deep sequencing technique demonstrates the possibility of comprehensive non-invasive prenatal testing of genetic disorders using isolated CFCs.
Assuntos
Separação Celular/métodos , Testes para Triagem do Soro Materno/métodos , Sequenciamento Completo do Genoma , Estudos de Viabilidade , Feminino , Humanos , Repetições de Microssatélites , Paternidade , GravidezRESUMO
Cancer, like many common disorders, has a complex etiology, often with a strong genetic component and with multiple environmental factors contributing to susceptibility. A considerable number of genomic variants have been previously reported to be causative of, or associated with, an increased risk for various types of cancer. Here, we adopted a next-generation sequencing approach in 11 members of two families of Greek descent to identify all genomic variants with the potential to predispose family members to cancer. Cross-comparison with data from the Human Gene Mutation Database identified a total of 571 variants, from which 47 % were disease-associated polymorphisms, 26 % disease-associated polymorphisms with additional supporting functional evidence, 19 % functional polymorphisms with in vitro/laboratory or in vivo supporting evidence but no known disease association, 4 % putative disease-causing mutations but with some residual doubt as to their pathological significance, and 3 % disease-causing mutations. Subsequent analysis, focused on the latter variant class most likely to be involved in cancer predisposition, revealed two variants of prime interest, namely MSH2 c.2732T>A (p.L911R) and BRCA1 c.2955delC, the first of which is novel. KMT2D c.13895delC and c.1940C>A variants are additionally reported as incidental findings. The next-generation sequencing-based family genomics approach described herein has the potential to be applied to other types of complex genetic disorder in order to identify variants of potential pathological significance.
Assuntos
Predisposição Genética para Doença , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias/genética , Proteína BRCA1/genética , Proteínas de Ligação a DNA/genética , Humanos , Proteína 2 Homóloga a MutS/genética , Mutação , Proteínas de Neoplasias/genética , Neoplasias/patologia , Linhagem , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Lung cancer is the leading cause of cancer-related mortality worldwide, with non-small-cell lung carcinomas in smokers being the predominant form of the disease. Although previous studies have identified important common somatic mutations in lung cancers, they have primarily focused on a limited set of genes and have thus provided a constrained view of the mutational spectrum. Recent cancer sequencing efforts have used next-generation sequencing technologies to provide a genome-wide view of mutations in leukaemia, breast cancer and cancer cell lines. Here we present the complete sequences of a primary lung tumour (60x coverage) and adjacent normal tissue (46x). Comparing the two genomes, we identify a wide variety of somatic variations, including >50,000 high-confidence single nucleotide variants. We validated 530 somatic single nucleotide variants in this tumour, including one in the KRAS proto-oncogene and 391 others in coding regions, as well as 43 large-scale structural variations. These constitute a large set of new somatic mutations and yield an estimated 17.7 per megabase genome-wide somatic mutation rate. Notably, we observe a distinct pattern of selection against mutations within expressed genes compared to non-expressed genes and in promoter regions up to 5 kilobases upstream of all protein-coding genes. Furthermore, we observe a higher rate of amino acid-changing mutations in kinase genes. We present a comprehensive view of somatic alterations in a single lung tumour, and provide the first evidence, to our knowledge, of distinct selective pressures present within the tumour environment.
Assuntos
Carcinoma Pulmonar de Células não Pequenas/genética , Genoma Humano/genética , Neoplasias Pulmonares/genética , Mutação Puntual/genética , Análise Mutacional de DNA , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Biológicos , Proto-Oncogene Mas , Seleção Genética/genéticaRESUMO
Rapid advances in DNA sequencing promise to enable new diagnostics and individualized therapies. Achieving personalized medicine, however, will require extensive research on highly reidentifiable, integrated datasets of genomic and health information. To assist with this, participants in the Personal Genome Project choose to forgo privacy via our institutional review board- approved "open consent" process. The contribution of public data and samples facilitates both scientific discovery and standardization of methods. We present our findings after enrollment of more than 1,800 participants, including whole-genome sequencing of 10 pilot participant genomes (the PGP-10). We introduce the Genome-Environment-Trait Evidence (GET-Evidence) system. This tool automatically processes genomes and prioritizes both published and novel variants for interpretation. In the process of reviewing the presumed healthy PGP-10 genomes, we find numerous literature references implying serious disease. Although it is sometimes impossible to rule out a late-onset effect, stringent evidence requirements can address the high rate of incidental findings. To that end we develop a peer production system for recording and organizing variant evaluations according to standard evidence guidelines, creating a public forum for reaching consensus on interpretation of clinically relevant variants. Genome analysis becomes a two-step process: using a prioritized list to record variant evaluations, then automatically sorting reviewed variants using these annotations. Genome data, health and trait information, participant samples, and variant interpretations are all shared in the public domain-we invite others to review our results using our participant samples and contribute to our interpretations. We offer our public resource and methods to further personalized medical research.
Assuntos
Bases de Dados Genéticas , Variação Genética , Genoma Humano/genética , Fenótipo , Medicina de Precisão/métodos , Software , Linhagem Celular , Coleta de Dados , Humanos , Medicina de Precisão/tendências , Análise de Sequência de DNARESUMO
Tanoak (Notholithocarpus densiflorus) is an evergreen tree in the Fagaceae family found in California and southern Oregon. Historically, tanoak acorns were an important food source for Native American tribes, and the bark was used extensively in the leather tanning process. Long considered a disjunct relictual element of the Asian stone oaks (Lithocarpus spp.), phylogenetic analysis has determined that the tanoak is an example of convergent evolution. Tanoaks are deeply divergent from oaks (Quercus) of the Pacific Northwest and comprise a new genus with a single species. These trees are highly susceptible to "sudden oak death" (SOD), a plant pathogen (Phytophthora ramorum) that has caused widespread deaths of tanoaks. In this study, we set out to assemble the genome and perform comparative studies among a number of individuals that demonstrated varying levels of susceptibility to SOD. First, we sequenced and de novo assembled a draft reference genome of N. densiflorus using cobarcoded library processing methods and an MGI DNBSEQ-G400 sequencer. To increase the contiguity of the final assembly, we also sequenced Oxford Nanopore long reads to 30× coverage. To our knowledge, the draft genome reported here is one of the more contiguous and complete genomes of a tree species published to date, with a contig N50 of â¼1.2â Mb, a scaffold N50 of â¼2.1â Mb, and a complete gene score of 95.5% through BUSCO analysis. In addition, we sequenced 11 genetically distinct individuals and mapped these onto the draft reference genome, enabling the discovery of almost 25 million single nucleotide polymorphisms and â¼4.4 million small insertions and deletions. Finally, using cobarcoded data, we were able to generate a complete haplotype coverage of all 11 genomes.
Assuntos
Fagaceae , Genoma de Planta , Fagaceae/genética , Filogenia , Anotação de Sequência Molecular , Genômica/métodos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
In this chapter, we describe single-tube long fragment read (stLFR), a simple preparation method for whole-genome sequencing and physical haplotyping based on the DNA co-barcoding strategy. Similar to LFR, stLFR applies the concept of adding the same barcode to subfragments derived from the same long DNA molecule. However, instead of a 384-well plate, stLFR uses the surface of micron-sized magnetic beads to create millions of virtual compartments in a single reaction tube. This is enabled by a split and pool barcoded bead preparation process capable of generating ~500,000 copies of the same unique barcode, from a library of ~3.6 billion unique barcodes, on each bead. The instruments and devices used in the stLFR process are easily accessible in nearly all standard molecular biology laboratories, and the cost of reagents can be as low as 30 dollars per sample. stLFR libraries can be sequenced by standard second-generation sequencing instruments (e.g., MGI or Illumina devices), and the barcode sharing information enables detection and phasing of all variations, including large structural variations. In addition, stLFR data can be used to scaffold contigs and de novo assemble genomes.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Análise Custo-Benefício , Haplótipos , Sequenciamento Completo do Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biblioteca Gênica , Análise de Sequência de DNARESUMO
In this chapter, we describe a simple, low-cost method for making many copies of a single DNA molecule (1-10 kb in length) as a concatemer on a long DNA strand. This can enable applications requiring high-quality contiguous sequence and haplotype data from long single DNA molecules at large scale.
Assuntos
DNA , Sequenciamento de Nucleotídeos em Larga Escala , Haplótipos/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , DNA/genéticaRESUMO
Sequencing of hypervariable regions as well as internal transcribed spacer regions of ribosomal RNA genes (rDNA) is broadly used to identify bacteria and fungi, but taxonomic and phylogenetic resolution is hampered by insufficient sequencing length using high throughput, cost-efficient second-generation sequencing. We developed a method to obtain nearly full-length rDNA by assembling single DNA molecules combining DNA co-barcoding with single-tube long fragment read technology and second-generation sequencing. Benchmarking was performed using mock bacterial and fungal communities as well as two forest soil samples. All mock species rDNA were successfully recovered with identities above 99.5% compared to the reference sequences. From the soil samples we obtained good coverage with identification of more than 20,000 unknown species, as well as high abundance correlation between replicates. This approach provides a cost-effective method for obtaining extensive and accurate information on complex environmental microbial communities.
Assuntos
Eucariotos , Microbiota , Filogenia , Eucariotos/genética , Genes de RNAr , Análise de Sequência de DNA/métodos , RNA Ribossômico/genética , Bactérias/genética , Microbiota/genética , DNA Ribossômico/genética , SoloRESUMO
Rapid technological advances are decreasing DNA sequencing costs and making it practical to undertake complete human genome sequencing on a large scale for the first time. Disease studies that involve sequencing hundreds of patient genomes are underway. The all-inclusive sequencing price per genome is expected to reach $1000 over the next few years and will likely decline further in the following years. This dramatic price decline will herald widespread personal genome sequencing and lead to significant improvements in human health and reduced health care costs. Key to realizing these benefits will be medical genomics' and systems biology's success in providing increasing contextual interpretation of biological and medical effects of the detected sequence variants in a genome. Given the substantial potential benefits and the manageability of the health and discrimination risks involved with the possible misuse of this information, we propose that governments and insurance companies support or even require personal genome sequencing. Critical to the widespread acceptance of personal genome sequencing, however, will be the need to educate physicians and the public about the realistic benefits and risks of such an analysis to prevent overinterpretation and misuse of this valuable information.