Pesquisa | Portal Regional da BVS

1.

Closing the gap: Solving complex medically relevant genes at scale.

Mahmoud, Medhat; Harting, John; Corbitt, Holly; Chen, Xiao; Jhangiani, Shalini N; Doddapaneni, Harsha; Meng, Qingchang; Han, Tina; Lambert, Christine; Zhang, Siyuan; Baybayan, Primo; Henno, Geoff; Shen, Hua; Hu, Jianhong; Han, Yi; Riegler, Casey; Metcalf, Ginger; Henno, Geoff; Chinn, Ivan K; Eberle, Michael A; Kingan, Sarah; Farinholt, Tim; Carvalho, Claudia M B; Gibbs, Richard A; Kronenberg, Zev; Muzny, Donna; Sedlazeck, Fritz J.

medRxiv ; 2024 Mar 18.

Artigo em Inglês | MEDLINE | ID: mdl-38562723

RESUMO

Comprehending the mechanism behind human diseases with an established heritable component represents the forefront of personalized medicine. Nevertheless, numerous medically important genes are inaccurately represented in short-read sequencing data analysis due to their complexity and repetitiveness or the so-called 'dark regions' of the human genome. The advent of PacBio as a long-read platform has provided new insights, yet HiFi whole-genome sequencing (WGS) cost remains frequently prohibitive. We introduce a targeted sequencing and analysis framework, Twist Alliance Dark Genes Panel (TADGP), designed to offer phased variants across 389 medically important yet complex autosomal genes. We highlight TADGP accuracy across eleven control samples and compare it to WGS. This demonstrates that TADGP achieves variant calling accuracy comparable to HiFi-WGS data, but at a fraction of the cost. Thus, enabling scalability and broad applicability for studying rare diseases or complementing previously sequenced samples to gain insights into these complex genes. TADGP revealed several candidate variants across all cases and provided insight into LPA diversity when tested on samples from rare disease and cardiovascular disease cohorts. In both cohorts, we identified novel variants affecting individual disease-associated genes (e.g., IKZF1, KCNE1). Nevertheless, the annotation of the variants across these 389 medically important genes remains challenging due to their underrepresentation in ClinVar and gnomAD. Consequently, we also offer an annotation resource to enhance the evaluation and prioritization of these variants. Overall, we can demonstrate that TADGP offers a cost-efficient and scalable approach to routinely assess the dark regions of the human genome with clinical relevance.

2.

Comprehensive de novo mutation discovery with HiFi long-read sequencing.

Kucuk, Erdi; van der Sanden, Bart P G H; O'Gorman, Luke; Kwint, Michael; Derks, Ronny; Wenger, Aaron M; Lambert, Christine; Chakraborty, Shreyasee; Baybayan, Primo; Rowell, William J; Brunner, Han G; Vissers, Lisenka E L M; Hoischen, Alexander; Gilissen, Christian.

Genome Med ; 15(1): 34, 2023 05 08.

Artigo em Inglês | MEDLINE | ID: mdl-37158973

RESUMO

BACKGROUND: Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease. METHODS: We sequenced the genomes of eight parent-child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing. RESULTS: We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data. CONCLUSIONS: HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL , Humanos , Alelos , Repetições de Microssatélites

3.

Approaches to long-read sequencing in a clinical setting to improve diagnostic rate.

Sanford Kobayashi, Erica; Batalov, Serge; Wenger, Aaron M; Lambert, Christine; Dhillon, Harsharan; Hall, Richard J; Baybayan, Primo; Ding, Yan; Rego, Seema; Wigby, Kristen; Friedman, Jennifer; Hobbs, Charlotte; Bainbridge, Matthew N.

Sci Rep ; 12(1): 16945, 2022 10 09.

Artigo em Inglês | MEDLINE | ID: mdl-36210382

RESUMO

Over the past decade, advances in genetic testing, particularly the advent of next-generation sequencing, have led to a paradigm shift in the diagnosis of molecular diseases and disorders. Despite our present collective ability to interrogate more than 90% of the human genome, portions of the genome have eluded us, resulting in stagnation of diagnostic yield with existing methodologies. Here we show how application of a new technology, long-read sequencing, has the potential to improve molecular diagnostic rates. Whole genome sequencing by long reads was able to cover 98% of next-generation sequencing dead zones, which are areas of the genome that are not interpretable by conventional industry-standard short-read sequencing. Through the ability of long-read sequencing to unambiguously call variants in these regions, we discovered an immunodeficiency due to a variant in IKBKG in a subject who had previously received a negative genome sequencing result. Additionally, we demonstrate the ability of long-read sequencing to detect small variants on par with short-read sequencing, its superior performance in identifying structural variants, and thirdly, its capacity to determine genomic methylation defects in native DNA. Though the latter technical abilities have been demonstrated, we demonstrate the clinical application of this technology to successfully identify multiple types of variants using a single test.

Assuntos

Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Sequência de Bases , Genômica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Quinase I-kappa B , Análise de Sequência de DNA/métodos

4.

Long-read HiFi sequencing of NUDT15: Phased full-gene haplotyping and pharmacogenomic allele discovery.

Scott, Erick R; Yang, Yao; Botton, Mariana R; Seki, Yoshinori; Hoshitsuki, Keito; Harting, John; Baybayan, Primo; Cody, Neal; Nicoletti, Paola; Moriyama, Takaya; Chakraborty, Shreyasee; Yang, Jun J; Edelmann, Lisa; Schadt, Eric E; Korlach, Jonas; Scott, Stuart A.

Hum Mutat ; 43(11): 1557-1566, 2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-36057977

RESUMO

To determine the phase of NUDT15 sequence variants for more comprehensive star (*) allele diplotyping, we developed a novel long-read single-molecule real-time HiFi amplicon sequencing method. A 10.5 kb NUDT15 amplicon assay was validated using reference material positive controls and additional samples for specimen type and blinded accuracy assessment. Triplicate NUDT15 HiFi sequencing of two reference material samples had nonreference genotype concordances of >99.9%, indicating that the assay is robust. Notably, short-read genome sequencing of a subset of samples was unable to determine the phase of star (*) allele-defining NUDT15 variants, resulting in ambiguous diplotype results. In contrast, long-read HiFi sequencing phased all variants across the NUDT15 amplicons, including a *2/*9 diplotype that previously was characterized as *1/*2 in the 1000 Genomes Project v3 data set. Assay throughput was also tested using 8.5 kb amplicons from 100 Ashkenazi Jewish individuals, which identified a novel NUDT15 *1 suballele (c.-121G>A) and a rare likely deleterious coding variant (p.Pro129Arg). Both novel alleles were Sanger confirmed and assigned as *1.007 and *20, respectively, by the PharmVar Consortium. Taken together, NUDT15 HiFi amplicon sequencing is an innovative method for phased full-gene characterization and novel allele discovery, which could improve NUDT15 pharmacogenomic testing and subsequent phenotype prediction.

Assuntos

Farmacogenética , Alelos , Genótipo , Haplótipos , Humanos , Análise de Sequência de DNA/métodos

5.

Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes.

Cohen, Ana S A; Farrow, Emily G; Abdelmoity, Ahmed T; Alaimo, Joseph T; Amudhavalli, Shivarajan M; Anderson, John T; Bansal, Lalit; Bartik, Lauren; Baybayan, Primo; Belden, Bradley; Berrios, Courtney D; Biswell, Rebecca L; Buczkowicz, Pawel; Buske, Orion; Chakraborty, Shreyasee; Cheung, Warren A; Coffman, Keith A; Cooper, Ashley M; Cross, Laura A; Curran, Tom; Dang, Thuy Tien T; Elfrink, Mary M; Engleman, Kendra L; Fecske, Erin D; Fieser, Cynthia; Fitzgerald, Keely; Fleming, Emily A; Gadea, Randi N; Gannon, Jennifer L; Gelineau-Morel, Rose N; Gibson, Margaret; Goldstein, Jeffrey; Grundberg, Elin; Halpin, Kelsee; Harvey, Brian S; Heese, Bryce A; Hein, Wendy; Herd, Suzanne M; Hughes, Susan S; Ilyas, Mohammed; Jacobson, Jill; Jenkins, Janda L; Jiang, Shao; Johnston, Jeffrey J; Keeler, Kathryn; Korlach, Jonas; Kussmann, Jennifer; Lambert, Christine; Lawson, Caitlin; Le Pichon, Jean-Baptiste.

Genet Med ; 24(6): 1336-1348, 2022 06.

Artigo em Inglês | MEDLINE | ID: mdl-35305867

RESUMO

PURPOSE: This study aimed to provide comprehensive diagnostic and candidate analyses in a pediatric rare disease cohort through the Genomic Answers for Kids program. METHODS: Extensive analyses of 960 families with suspected genetic disorders included short-read exome sequencing and short-read genome sequencing (srGS); PacBio HiFi long-read genome sequencing (HiFi-GS); variant calling for single nucleotide variants (SNV), structural variant (SV), and repeat variants; and machine-learning variant prioritization. Structured phenotypes, prioritized variants, and pedigrees were stored in PhenoTips database, with data sharing through controlled access the database of Genotypes and Phenotypes. RESULTS: Diagnostic rates ranged from 11% in patients with prior negative genetic testing to 34.5% in naive patients. Incorporating SVs from genome sequencing added up to 13% of new diagnoses in previously unsolved cases. HiFi-GS yielded increased discovery rate with >4-fold more rare coding SVs compared with srGS. Variants and genes of unknown significance remain the most common finding (58% of nondiagnostic cases). CONCLUSION: Computational prioritization is efficient for diagnostic SNVs. Thorough identification of non-SNVs remains challenging and is partly mitigated using HiFi-GS sequencing. Importantly, community research is supported by sharing real-time data to accelerate gene validation and by providing HiFi variant (SNV/SV) resources from >1000 human alleles to facilitate implementation of new sequencing platforms for rare disease diagnoses.

Assuntos

Genômica , Doenças Raras , Criança , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Linhagem , Doenças Raras/diagnóstico , Doenças Raras/genética , Análise de Sequência de DNA

6.

Correction: Long-read trio sequencing of individuals with unsolved intellectual disability.

Pauper, Marc; Kucuk, Erdi; Wenger, Aaron M; Chakraborty, Shreyasee; Baybayan, Primo; Kwint, Michael; van der Sanden, Bart; Nelen, Marcel R; Derks, Ronny; Brunner, Han G; Hoischen, Alexander; Vissers, Lisenka E L M; Gilissen, Christian.

Eur J Hum Genet ; 29(4): 720, 2021 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-33772160

7.

Long-read trio sequencing of individuals with unsolved intellectual disability.

Pauper, Marc; Kucuk, Erdi; Wenger, Aaron M; Chakraborty, Shreyasee; Baybayan, Primo; Kwint, Michael; van der Sanden, Bart; Nelen, Marcel R; Derks, Ronny; Brunner, Han G; Hoischen, Alexander; Vissers, Lisenka E L M; Gilissen, Christian.

Eur J Hum Genet ; 29(4): 637-648, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33257779

RESUMO

Long-read sequencing (LRS) has the potential to comprehensively identify all medically relevant genome variation, including variation commonly missed by short-read sequencing (SRS) approaches. To determine this potential, we performed LRS around 15×-40× genome coverage using the Pacific Biosciences Sequel I System for five trios. The respective probands were diagnosed with intellectual disability (ID) whose etiology remained unresolved after SRS exomes and genomes. Systematic assessment of LRS coverage showed that ~35 Mb of the human reference genome was only accessible by LRS and not SRS. Genome-wide structural variant (SV) calling yielded on average 28,292 SV calls per individual, totaling 12.9 Mb of sequence. Trio-based analyses which allowed to study segregation, showed concordance for up to 95% of these SV calls across the genome, and 80% of the LRS SV calls were not identified by SRS. De novo mutation analysis did not identify any de novo SVs, confirming that these are rare events. Because of high sequence coverage, we were also able to call single nucleotide substitutions. On average, we identified 3 million substitutions per genome, with a Mendelian inheritance concordance of up to 97%. Of these, ~100,000 were located in the ~35 Mb of the genome that was only captured by LRS. Moreover, these variants affected the coding sequence of 64 genes, including 32 known Mendelian disease genes. Our data show the potential added value of LRS compared to SRS for identifying medically relevant genome variation.

Assuntos

Testes Genéticos/métodos , Deficiência Intelectual/genética , Análise de Sequência de DNA/métodos , Humanos , Deficiência Intelectual/diagnóstico , Mutação , Linhagem , Polimorfismo Genético

8.

Variant phasing and haplotypic expression from long-read sequencing in maize.

Wang, Bo; Tseng, Elizabeth; Baybayan, Primo; Eng, Kevin; Regulski, Michael; Jiao, Yinping; Wang, Liya; Olson, Andrew; Chougule, Kapeel; Buren, Peter Van; Ware, Doreen.

Commun Biol ; 3(1): 78, 2020 02 18.

Artigo em Inglês | MEDLINE | ID: mdl-32071408

RESUMO

Haplotype phasing maize genetic variants is important for genome interpretation, population genetic analysis and functional analysis of allelic activity. We performed an isoform-level phasing study using two maize inbred lines and their reciprocal crosses, based on single-molecule, full-length cDNA sequencing. To phase and analyze transcripts between hybrids and parents, we developed IsoPhase. Using this tool, we validated the majority of SNPs called against matching short-read data from embryo, endosperm and root tissues, and identified allele-specific, gene-level and isoform-level differential expression between the inbred parental lines and hybrid offspring. After phasing 6907 genes in the reciprocal hybrids, we annotated the SNPs and identified large-effect genes. In addition, we identified parent-of-origin isoforms, distinct novel isoforms in maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase accuracy in studies of allelic expression.

Assuntos

Análise de Sequência de RNA/métodos , Zea mays/genética , Alelos , Endosperma/genética , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica de Plantas , Genes de Plantas , Genoma de Planta , Haplótipos , Mutação , Proteínas de Plantas/genética , Plantas Geneticamente Modificadas , RNA Mensageiro/análise , RNA Mensageiro/genética , Zea mays/fisiologia

9.

Structural variation and its potential impact on genome instability: Novel discoveries in the EGFR landscape by long-read sequencing.

Cook, George W; Benton, Michael G; Akerley, Wallace; Mayhew, George F; Moehlenkamp, Cynthia; Raterman, Denise; Burgess, Daniel L; Rowell, William J; Lambert, Christine; Eng, Kevin; Gu, Jenny; Baybayan, Primo; Fussell, John T; Herbold, Heath D; O'Shea, John M; Varghese, Thomas K; Emerson, Lyska L.

PLoS One ; 15(1): e0226340, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31940362

RESUMO

Structural variation (SV) is typically defined as variation within the human genome that exceeds 50 base pairs (bp). SV may be copy number neutral or it may involve duplications, deletions, and complex rearrangements. Recent studies have shown SV to be associated with many human diseases. However, studies of SV have been challenging due to technological constraints. With the advent of third generation (long-read) sequencing technology, exploration of longer stretches of DNA not easily examined previously has been made possible. In the present study, we utilized third generation (long-read) sequencing techniques to examine SV in the EGFR landscape of four haplotypes derived from two human samples. We analyzed the EGFR gene and its landscape (+/- 500,000 base pairs) using this approach and were able to identify a region of non-coding DNA with over 90% similarity to the most common activating EGFR mutation in non-small cell lung cancer. Based on previously published Alu-element genome instability algorithms, we propose a molecular mechanism to explain how this non-coding region of DNA may be interacting with and impacting the stability of the EGFR gene and potentially generating this cancer-driver gene. By these techniques, we were also able to identify previously hidden structural variation in the four haplotypes and in the human reference genome (hg38). We applied previously published algorithms to compare the relative stabilities of these five different EGFR gene landscape haplotypes to estimate their relative potentials to generate the EGFR exon 19, 15 bp canonical deletion. To our knowledge, the present study is the first to use the differences in genomic architecture between targeted cancer-linked phased haplotypes to estimate their relative potentials to form a common cancer-linked driver mutation.

Assuntos

Genes erbB-1/genética , Variação Genética , Genoma Humano/genética , Instabilidade Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Carcinoma Pulmonar de Células não Pequenas/genética , Simulação por Computador , Haplótipos , Humanos , Neoplasias Pulmonares/genética , Análise de Sequência de DNA

10.

A high-quality genome assembly from a single, field-collected spotted lanternfly (Lycorma delicatula) using the PacBio Sequel II system.

Kingan, Sarah B; Urban, Julie; Lambert, Christine C; Baybayan, Primo; Childers, Anna K; Coates, Brad; Scheffler, Brian; Hackett, Kevin; Korlach, Jonas; Geib, Scott M.

Gigascience ; 8(10)2019 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-31609423

RESUMO

BACKGROUND: A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. RESULTS: The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of â¼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing â¼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. CONCLUSIONS: We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.

Assuntos

Dípteros/genética , Genoma de Inseto , Genômica/métodos , Animais , Feminino , Biblioteca Gênica , Espécies Introduzidas , Análise de Sequência de DNA

11.

Single-Molecule Real-Time (SMRT) Full-Length RNA-Sequencing Reveals Novel and Distinct mRNA Isoforms in Human Bone Marrow Cell Subpopulations.

Deslattes Mays, Anne; Schmidt, Marcel; Graham, Garrett; Tseng, Elizabeth; Baybayan, Primo; Sebra, Robert; Sanda, Miloslav; Mazarati, Jean-Baptiste; Riegel, Anna; Wellstein, Anton.

Genes (Basel) ; 10(4)2019 03 27.

Artigo em Inglês | MEDLINE | ID: mdl-30934798

RESUMO

Hematopoietic cells are continuously replenished from progenitor cells that reside in the bone marrow. To evaluate molecular changes during this process, we analyzed the transcriptomes of freshly harvested human bone marrow progenitor (lineage-negative) and differentiated (lineage-positive) cells by single-molecule real-time (SMRT) full-length RNA-sequencing. This analysis revealed a ~5-fold higher number of transcript isoforms than previously detected and showed a distinct composition of individual transcript isoforms characteristic for bone marrow subpopulations. A detailed analysis of messenger RNA (mRNA) isoforms transcribed from the ANXA1 and EEF1A1 loci confirmed their distinct composition. The expression of proteins predicted from the transcriptome analysis was evaluated by mass spectrometry and validated previously unknown protein isoforms predicted e.g., for EEF1A1. These protein isoforms distinguished the lineage negative cell population from the lineage positive cell population. Finally, transcript isoforms expressed from paralogous gene loci (e.g., CFD, GATA2, HLA-A, B, and C) also distinguished cell subpopulations but were only detectable by full-length RNA sequencing. Thus, qualitatively distinct transcript isoforms from individual genomic loci separate bone marrow cell subpopulations indicating complex transcriptional regulation and protein isoform generation during hematopoiesis.

Assuntos

Linhagem da Célula/genética , Sequenciamento de Nucleotídeos em Larga Escala , RNA Mensageiro/genética , Transcriptoma/genética , Processamento Alternativo/genética , Células da Medula Óssea/metabolismo , Genômica/métodos , Humanos , Imagem Individual de Molécula/métodos , Sequenciamento do Exoma/métodos

12.

Comparative Genome Analysis of an Extensively Drug-Resistant Isolate of Avian Sequence Type 167 Escherichia coli Strain Sanji with Novel In Silico Serotype O89b:H9.

Zeng, Xiancheng; Chi, Xuelin; Ho, Brian T; Moon, Damee; Lambert, Christine; Hall, Richard J; Baybayan, Primo; Wang, Shihua; Wilson, Brenda A; Ho, Mengfei.

mSystems ; 4(1)2019.

Artigo em Inglês | MEDLINE | ID: mdl-30834329

RESUMO

Extensive drug resistance (XDR) is an escalating global problem. Escherichia coli strain Sanji was isolated from an outbreak of pheasant colibacillosis in Fujian province, China, in 2011. This strain has XDR properties, exhibiting sensitivity to carbapenems but no other classes of known antibiotics. Whole-genome sequencing revealed a total of 32 known antibiotic resistance genes, many associated with insertion sequence 26 (IS26) elements. These were found on the Sanji chromosome and 2 of its 6 plasmids, pSJ_255 and pSJ_82. The Sanji chromosome also harbors a type 2 secretion system (T2SS), a type 3 secretion system (T3SS), a type 6 secretion system (T6SS), and several putative prophages. Sanji and other ST167 strains have a previously uncharacterized O-antigen (O89b) that is most closely related to serotype O89 as determined on the basis of analysis of the wzm-wzt genes and in silico serotyping. This O89b-antigen gene cluster was also found in the genomes of a few other pathogenic sequence type 617 (ST617) and ST10 complex strains. A time-scaled phylogeny inferred from comparative single nucleotide variant analysis indicated that development of these O89b-containing lineages emerged about 30 years ago. Comparative sequence analysis revealed that the core genome of Sanji is nearly identical to that of several recently sequenced strains of pathogenic XDR E. coli belonging to the ST167 group. Comparison of the mobile elements among the different ST167 genomes revealed that each genome carries a distinct set of multidrug resistance genes on different types of plasmids, indicating that there are multiple paths toward the emergence of XDR in E. coli. IMPORTANCE E. coli strain Sanji is the first sequenced and analyzed genome of the recently emerged pathogenic XDR strains with sequence type ST167 and novel in silico serotype O89b:H9. Comparison of the genomes of Sanji with other ST167 strains revealed distinct sets of different plasmids, mobile IS elements, and antibiotic resistance genes in each genome, indicating that there exist multiple paths toward achieving XDR. The emergence of these pathogenic ST167 E. coli strains with diverse XDR capabilities highlights the difficulty of preventing or mitigating the development of XDR properties in bacteria and points to the importance of better understanding of the shared underlying virulence mechanisms and physiology of pathogenic bacteria.

13.

A High-Quality De novo Genome Assembly from a Single Mosquito Using PacBio Sequencing.

Kingan, Sarah B; Heaton, Haynes; Cudini, Juliana; Lambert, Christine C; Baybayan, Primo; Galvin, Brendan D; Durbin, Richard; Korlach, Jonas; Lawniczak, Mara K N.

Genes (Basel) ; 10(1)2019 01 18.

Artigo em Inglês | MEDLINE | ID: mdl-30669388

RESUMO

A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.

Assuntos

Anopheles/genética , Genoma de Inseto , Análise de Sequência de DNA/métodos , Animais , Mapeamento de Sequências Contíguas/métodos , Mapeamento de Sequências Contíguas/normas , Ploidias , Polimorfismo Genético , Análise de Sequência de DNA/normas

14.

Reference Grade Characterization of Polymorphisms in Full-Length HLA Class I and II Genes With Short-Read Sequencing on the ION PGM System and Long-Reads Generated by Single Molecule, Real-Time Sequencing on the PacBio Platform.

Suzuki, Shingo; Ranade, Swati; Osaki, Ken; Ito, Sayaka; Shigenari, Atsuko; Ohnuki, Yuko; Oka, Akira; Masuya, Anri; Harting, John; Baybayan, Primo; Kitazume, Miwako; Sunaga, Junichi; Morishima, Satoko; Morishima, Yasuo; Inoko, Hidetoshi; Kulski, Jerzy K; Shiina, Takashi.

Front Immunol ; 9: 2294, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30337930

RESUMO

Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2%. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.

Assuntos

Biologia Computacional/métodos , Genes MHC da Classe II , Genes MHC Classe I , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software , Adulto , Idoso , Idoso de 80 Anos ou mais , Alelos , Artrite Reumatoide/genética , Feminino , Frequência do Gene , Estudos de Associação Genética , Predisposição Genética para Doença , Genômica/métodos , Genótipo , Técnicas de Genotipagem , Humanos , Masculino , Pessoa de Meia-Idade , Filogenia , Polimorfismo Genético

15.

A High-Quality, Long-Read De Novo Genome Assembly to Aid Conservation of Hawaii's Last Remaining Crow Species.

Sutton, Jolene T; Helmkampf, Martin; Steiner, Cynthia C; Bellinger, M Renee; Korlach, Jonas; Hall, Richard; Baybayan, Primo; Muehling, Jill; Gu, Jenny; Kingan, Sarah; Masuda, Bryce M; Ryder, Oliver A.

Genes (Basel) ; 9(8)2018 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-30071683

RESUMO

Abstract: Genome-level data can provide researchers with unprecedented precision to examine the causes and genetic consequences of population declines, which can inform conservation management. Here, we present a high-quality, long-read, de novo genome assembly for one of the world's most endangered bird species, the 'Alala (Corvus hawaiiensis; Hawaiian crow). As the only remaining native crow species in Hawai'i, the 'Alala survived solely in a captive-breeding program from 2002 until 2016, at which point a long-term reintroduction program was initiated. The high-quality genome assembly was generated to lay the foundation for both comparative genomics studies and the development of population-level genomic tools that will aid conservation and recovery efforts. We illustrate how the quality of this assembly places it amongst the very best avian genomes assembled to date, comparable to intensively studied model systems. We describe the genome architecture in terms of repetitive elements and runs of homozygosity, and we show that compared with more outbred species, the 'Alala genome is substantially more homozygous. We also provide annotations for a subset of immunity genes that are likely to be important in conservation management, and we discuss how this genome is currently being used as a roadmap for downstream conservation applications.

16.

Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (>11 kb), single molecule, real-time sequencing.

Vembar, Shruthi Sridhar; Seetin, Matthew; Lambert, Christine; Nattestad, Maria; Schatz, Michael C; Baybayan, Primo; Scherf, Artur; Smith, Melissa Laird.

DNA Res ; 23(4): 339-51, 2016 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-27345719

RESUMO

The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [â¼80.6% (A + T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12 kb, with 50% of the reads between 15.5 and 50 kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [â¼90-99% (A + T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission.

Assuntos

Genoma de Protozoário , Plasmodium falciparum/genética , Telômero/genética , Mapeamento de Sequências Contíguas , Polimorfismo Genético , Análise de Sequência de DNA

17.

Complete genome sequence of Streptomyces sp. strain CFMR 7, a natural rubber degrading actinomycete isolated from Penang, Malaysia.

Nanthini, Jayaram; Chia, Kim-Hou; Thottathil, Gincy P; Taylor, Todd D; Kondo, Shinji; Najimudin, Nazalan; Baybayan, Primo; Singh, Siddharth; Sudesh, Kumar.

J Biotechnol ; 214: 47-8, 2015 Nov 20.

Artigo em Inglês | MEDLINE | ID: mdl-26376470

RESUMO

Streptomyces sp. strain CFMR 7, which naturally degrades rubber, was isolated from a rubber plantation. Whole genome sequencing and assembly resulted in 2 contigs with total genome size of 8.248 Mb. Two latex clearing protein (lcp) genes which are responsible for rubber degrading activities were identified.

Assuntos

Proteínas de Bactérias/genética , Genoma Bacteriano/genética , Látex/metabolismo , Streptomyces/genética , Streptomyces/metabolismo , DNA Bacteriano/análise , DNA Bacteriano/genética , Malásia , Análise de Sequência de DNA

18.

The complete methylome of Helicobacter pylori UM032.

Lee, Woon Ching; Anton, Brian P; Wang, Susana; Baybayan, Primo; Singh, Siddarth; Ashby, Meredith; Chua, Eng Guan; Tay, Chin Yen; Thirriot, Fanny; Loke, Mun Fai; Goh, Khean Lee; Marshall, Barry J; Roberts, Richard J; Vadivelu, Jamuna.

BMC Genomics ; 16: 424, 2015 Jun 02.

Artigo em Inglês | MEDLINE | ID: mdl-26031894

RESUMO

BACKGROUND: The genome of the human gastric pathogen Helicobacter pylori encodes a large number of DNA methyltransferases (MTases), some of which are shared among many strains, and others of which are unique to a given strain. The MTases have potential roles in the survival of the bacterium. In this study, we sequenced a Malaysian H. pylori clinical strain, designated UM032, by using a combination of PacBio Single Molecule, Real-Time (SMRT) and Illumina MiSeq next generation sequencing platforms, and used the SMRT data to characterize the set of methylated bases (the methylome). RESULTS: The N4-methylcytosine and N6-methyladenine modifications detected at single-base resolution using SMRT technology revealed 17 methylated sequence motifs corresponding to one Type I and 16 Type II restriction-modification (R-M) systems. Previously unassigned methylation motifs were now assigned to their respective MTases-coding genes. Furthermore, one gene that appears to be inactive in the H. pylori UM032 genome during normal growth was characterized by cloning. CONCLUSION: Consistent with previously-studied H. pylori strains, we show that strain UM032 contains a relatively large number of R-M systems, including some MTase activities with novel specificities. Additional studies are underway to further elucidating the biological significance of the R-M systems in the physiology and pathogenesis of H. pylori.

Assuntos

Metilação de DNA , Genoma Bacteriano , Helicobacter pylori/genética , Proteínas de Bactérias/metabolismo , Sequência de Bases , Enzimas de Restrição do DNA/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Metiltransferases/metabolismo , Análise de Sequência de DNA , Interface Usuário-Computador

19.

Burkholderia pseudomallei sequencing identifies genomic clades with distinct recombination, accessory, and epigenetic profiles.

Nandi, Tannistha; Holden, Matthew T G; Didelot, Xavier; Mehershahi, Kurosh; Boddey, Justin A; Beacham, Ifor; Peak, Ian; Harting, John; Baybayan, Primo; Guo, Yan; Wang, Susana; How, Lee Chee; Sim, Bernice; Essex-Lopresti, Angela; Sarkar-Tyson, Mitali; Nelson, Michelle; Smither, Sophie; Ong, Catherine; Aw, Lay Tin; Hoon, Chua Hui; Michell, Stephen; Studholme, David J; Titball, Richard; Chen, Swaine L; Parkhill, Julian; Tan, Patrick.

Genome Res ; 25(4): 608, 2015 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-25834186

20.

Complete Genome Sequence of the Hypervirulent Bacterium Clostridium difficile Strain G46, Ribotype 027.

Gaulton, Tom; Misra, Raju; Rose, Graham; Baybayan, Primo; Hall, Richard; Freeman, Jane; Turton, Jane; Picton, Steve; Korlach, Jonas; Gharbia, Saheer; Shah, Haroun.

Genome Announc ; 3(2)2015 Mar 26.

Artigo em Inglês | MEDLINE | ID: mdl-25814591

RESUMO

Clostridium difficile is one of the leading causes of antibiotic-associated diarrhea in health care facilities worldwide. Here, we report the genome sequence of C. difficile strain G46, ribotype 027, isolated from an outbreak in Glamorgan, Wales, in 2006.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA