Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Hum Mutat ; 43(11): 1557-1566, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36057977

RESUMEN

To determine the phase of NUDT15 sequence variants for more comprehensive star (*) allele diplotyping, we developed a novel long-read single-molecule real-time HiFi amplicon sequencing method. A 10.5 kb NUDT15 amplicon assay was validated using reference material positive controls and additional samples for specimen type and blinded accuracy assessment. Triplicate NUDT15 HiFi sequencing of two reference material samples had nonreference genotype concordances of >99.9%, indicating that the assay is robust. Notably, short-read genome sequencing of a subset of samples was unable to determine the phase of star (*) allele-defining NUDT15 variants, resulting in ambiguous diplotype results. In contrast, long-read HiFi sequencing phased all variants across the NUDT15 amplicons, including a *2/*9 diplotype that previously was characterized as *1/*2 in the 1000 Genomes Project v3 data set. Assay throughput was also tested using 8.5 kb amplicons from 100 Ashkenazi Jewish individuals, which identified a novel NUDT15 *1 suballele (c.-121G>A) and a rare likely deleterious coding variant (p.Pro129Arg). Both novel alleles were Sanger confirmed and assigned as *1.007 and *20, respectively, by the PharmVar Consortium. Taken together, NUDT15 HiFi amplicon sequencing is an innovative method for phased full-gene characterization and novel allele discovery, which could improve NUDT15 pharmacogenomic testing and subsequent phenotype prediction.


Asunto(s)
Farmacogenética , Alelos , Genotipo , Haplotipos , Humanos , Análisis de Secuencia de ADN/métodos
2.
Genet Med ; 24(6): 1336-1348, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35305867

RESUMEN

PURPOSE: This study aimed to provide comprehensive diagnostic and candidate analyses in a pediatric rare disease cohort through the Genomic Answers for Kids program. METHODS: Extensive analyses of 960 families with suspected genetic disorders included short-read exome sequencing and short-read genome sequencing (srGS); PacBio HiFi long-read genome sequencing (HiFi-GS); variant calling for single nucleotide variants (SNV), structural variant (SV), and repeat variants; and machine-learning variant prioritization. Structured phenotypes, prioritized variants, and pedigrees were stored in PhenoTips database, with data sharing through controlled access the database of Genotypes and Phenotypes. RESULTS: Diagnostic rates ranged from 11% in patients with prior negative genetic testing to 34.5% in naive patients. Incorporating SVs from genome sequencing added up to 13% of new diagnoses in previously unsolved cases. HiFi-GS yielded increased discovery rate with >4-fold more rare coding SVs compared with srGS. Variants and genes of unknown significance remain the most common finding (58% of nondiagnostic cases). CONCLUSION: Computational prioritization is efficient for diagnostic SNVs. Thorough identification of non-SNVs remains challenging and is partly mitigated using HiFi-GS sequencing. Importantly, community research is supported by sharing real-time data to accelerate gene validation and by providing HiFi variant (SNV/SV) resources from >1000 human alleles to facilitate implementation of new sequencing platforms for rare disease diagnoses.


Asunto(s)
Genómica , Enfermedades Raras , Niño , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Linaje , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Análisis de Secuencia de ADN
3.
Genome Res ; 25(1): 129-41, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25236617

RESUMEN

Burkholderia pseudomallei (Bp) is the causative agent of the infectious disease melioidosis. To investigate population diversity, recombination, and horizontal gene transfer in closely related Bp isolates, we performed whole-genome sequencing (WGS) on 106 clinical, animal, and environmental strains from a restricted Asian locale. Whole-genome phylogenies resolved multiple genomic clades of Bp, largely congruent with multilocus sequence typing (MLST). We discovered widespread recombination in the Bp core genome, involving hundreds of regions associated with multiple haplotypes. Highly recombinant regions exhibited functional enrichments that may contribute to virulence. We observed clade-specific patterns of recombination and accessory gene exchange, and provide evidence that this is likely due to ongoing recombination between clade members. Reciprocally, interclade exchanges were rarely observed, suggesting mechanisms restricting gene flow between clades. Interrogation of accessory elements revealed that each clade harbored a distinct complement of restriction-modification (RM) systems, predicted to cause clade-specific patterns of DNA methylation. Using methylome sequencing, we confirmed that representative strains from separate clades indeed exhibit distinct methylation profiles. Finally, using an E. coli system, we demonstrate that Bp RM systems can inhibit uptake of non-self DNA. Our data suggest that RM systems borne on mobile elements, besides preventing foreign DNA invasion, may also contribute to limiting exchanges of genetic material between individuals of the same species. Genomic clades may thus represent functional units of genetic isolation in Bp, modulating intraspecies genetic diversity.


Asunto(s)
Burkholderia pseudomallei/genética , Epigénesis Genética , Genoma Bacteriano , Recombinación Genética , Transcriptoma , Animales , Cartilla de ADN , ADN Bacteriano/genética , Escherichia coli/genética , Femenino , Eliminación de Gen , Estudios de Asociación Genética , Genómica , Haplotipos , Humanos , Melioidosis/microbiología , Ratones , Ratones Endogámicos BALB C , Tipificación de Secuencias Multilocus , Filogenia , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN
4.
BMC Genomics ; 16: 424, 2015 Jun 02.
Artículo en Inglés | MEDLINE | ID: mdl-26031894

RESUMEN

BACKGROUND: The genome of the human gastric pathogen Helicobacter pylori encodes a large number of DNA methyltransferases (MTases), some of which are shared among many strains, and others of which are unique to a given strain. The MTases have potential roles in the survival of the bacterium. In this study, we sequenced a Malaysian H. pylori clinical strain, designated UM032, by using a combination of PacBio Single Molecule, Real-Time (SMRT) and Illumina MiSeq next generation sequencing platforms, and used the SMRT data to characterize the set of methylated bases (the methylome). RESULTS: The N4-methylcytosine and N6-methyladenine modifications detected at single-base resolution using SMRT technology revealed 17 methylated sequence motifs corresponding to one Type I and 16 Type II restriction-modification (R-M) systems. Previously unassigned methylation motifs were now assigned to their respective MTases-coding genes. Furthermore, one gene that appears to be inactive in the H. pylori UM032 genome during normal growth was characterized by cloning. CONCLUSION: Consistent with previously-studied H. pylori strains, we show that strain UM032 contains a relatively large number of R-M systems, including some MTase activities with novel specificities. Additional studies are underway to further elucidating the biological significance of the R-M systems in the physiology and pathogenesis of H. pylori.


Asunto(s)
Metilación de ADN , Genoma Bacteriano , Helicobacter pylori/genética , Proteínas Bacterianas/metabolismo , Secuencia de Bases , Enzimas de Restricción del ADN/metabolismo , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Metiltransferasas/metabolismo , Análisis de Secuencia de ADN , Interfaz Usuario-Computador
5.
medRxiv ; 2024 Mar 18.
Artículo en Inglés | MEDLINE | ID: mdl-38562723

RESUMEN

Comprehending the mechanism behind human diseases with an established heritable component represents the forefront of personalized medicine. Nevertheless, numerous medically important genes are inaccurately represented in short-read sequencing data analysis due to their complexity and repetitiveness or the so-called 'dark regions' of the human genome. The advent of PacBio as a long-read platform has provided new insights, yet HiFi whole-genome sequencing (WGS) cost remains frequently prohibitive. We introduce a targeted sequencing and analysis framework, Twist Alliance Dark Genes Panel (TADGP), designed to offer phased variants across 389 medically important yet complex autosomal genes. We highlight TADGP accuracy across eleven control samples and compare it to WGS. This demonstrates that TADGP achieves variant calling accuracy comparable to HiFi-WGS data, but at a fraction of the cost. Thus, enabling scalability and broad applicability for studying rare diseases or complementing previously sequenced samples to gain insights into these complex genes. TADGP revealed several candidate variants across all cases and provided insight into LPA diversity when tested on samples from rare disease and cardiovascular disease cohorts. In both cohorts, we identified novel variants affecting individual disease-associated genes (e.g., IKZF1, KCNE1). Nevertheless, the annotation of the variants across these 389 medically important genes remains challenging due to their underrepresentation in ClinVar and gnomAD. Consequently, we also offer an annotation resource to enhance the evaluation and prioritization of these variants. Overall, we can demonstrate that TADGP offers a cost-efficient and scalable approach to routinely assess the dark regions of the human genome with clinical relevance.

7.
Genome Med ; 15(1): 34, 2023 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-37158973

RESUMEN

BACKGROUND: Long-read sequencing (LRS) techniques have been very successful in identifying structural variants (SVs). However, the high error rate of LRS made the detection of small variants (substitutions and short indels < 20 bp) more challenging. The introduction of PacBio HiFi sequencing makes LRS also suited for detecting small variation. Here we evaluate the ability of HiFi reads to detect de novo mutations (DNMs) of all types, which are technically challenging variant types and a major cause of sporadic, severe, early-onset disease. METHODS: We sequenced the genomes of eight parent-child trios using high coverage PacBio HiFi LRS (~ 30-fold coverage) and Illumina short-read sequencing (SRS) (~ 50-fold coverage). De novo substitutions, small indels, short tandem repeats (STRs) and SVs were called in both datasets and compared to each other to assess the accuracy of HiFi LRS. In addition, we determined the parent-of-origin of the small DNMs using phasing. RESULTS: We identified a total of 672 and 859 de novo substitutions/indels, 28 and 126 de novo STRs, and 24 and 1 de novo SVs in LRS and SRS respectively. For the small variants, there was a 92 and 85% concordance between the platforms. For the STRs and SVs, the concordance was 3.6 and 0.8%, and 4 and 100% respectively. We successfully validated 27/54 LRS-unique small variants, of which 11 (41%) were confirmed as true de novo events. For the SRS-unique small variants, we validated 42/133 DNMs and 8 (19%) were confirmed as true de novo event. Validation of 18 LRS-unique de novo STR calls confirmed none of the repeat expansions as true DNM. Confirmation of the 23 LRS-unique SVs was possible for 19 candidate SVs of which 10 (52.6%) were true de novo events. Furthermore, we were able to assign 96% of DNMs to their parental allele with LRS data, as opposed to just 20% with SRS data. CONCLUSIONS: HiFi LRS can now produce the most comprehensive variant dataset obtainable by a single technology in a single laboratory, allowing accurate calling of substitutions, indels, STRs and SVs. The accuracy even allows sensitive calling of DNMs on all variant levels, and also allows for phasing, which helps to distinguish true positive from false positive DNMs.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Mutación INDEL , Humanos , Alelos , Repeticiones de Microsatélite
8.
Sci Rep ; 12(1): 16945, 2022 10 09.
Artículo en Inglés | MEDLINE | ID: mdl-36210382

RESUMEN

Over the past decade, advances in genetic testing, particularly the advent of next-generation sequencing, have led to a paradigm shift in the diagnosis of molecular diseases and disorders. Despite our present collective ability to interrogate more than 90% of the human genome, portions of the genome have eluded us, resulting in stagnation of diagnostic yield with existing methodologies. Here we show how application of a new technology, long-read sequencing, has the potential to improve molecular diagnostic rates. Whole genome sequencing by long reads was able to cover 98% of next-generation sequencing dead zones, which are areas of the genome that are not interpretable by conventional industry-standard short-read sequencing. Through the ability of long-read sequencing to unambiguously call variants in these regions, we discovered an immunodeficiency due to a variant in IKBKG in a subject who had previously received a negative genome sequencing result. Additionally, we demonstrate the ability of long-read sequencing to detect small variants on par with short-read sequencing, its superior performance in identifying structural variants, and thirdly, its capacity to determine genomic methylation defects in native DNA. Though the latter technical abilities have been demonstrated, we demonstrate the clinical application of this technology to successfully identify multiple types of variants using a single test.


Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Secuencia de Bases , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Quinasa I-kappa B , Análisis de Secuencia de ADN/métodos
9.
Eur J Hum Genet ; 29(4): 637-648, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33257779

RESUMEN

Long-read sequencing (LRS) has the potential to comprehensively identify all medically relevant genome variation, including variation commonly missed by short-read sequencing (SRS) approaches. To determine this potential, we performed LRS around 15×-40× genome coverage using the Pacific Biosciences Sequel I System for five trios. The respective probands were diagnosed with intellectual disability (ID) whose etiology remained unresolved after SRS exomes and genomes. Systematic assessment of LRS coverage showed that ~35 Mb of the human reference genome was only accessible by LRS and not SRS. Genome-wide structural variant (SV) calling yielded on average 28,292 SV calls per individual, totaling 12.9 Mb of sequence. Trio-based analyses which allowed to study segregation, showed concordance for up to 95% of these SV calls across the genome, and 80% of the LRS SV calls were not identified by SRS. De novo mutation analysis did not identify any de novo SVs, confirming that these are rare events. Because of high sequence coverage, we were also able to call single nucleotide substitutions. On average, we identified 3 million substitutions per genome, with a Mendelian inheritance concordance of up to 97%. Of these, ~100,000 were located in the ~35 Mb of the genome that was only captured by LRS. Moreover, these variants affected the coding sequence of 64 genes, including 32 known Mendelian disease genes. Our data show the potential added value of LRS compared to SRS for identifying medically relevant genome variation.


Asunto(s)
Pruebas Genéticas/métodos , Discapacidad Intelectual/genética , Análisis de Secuencia de ADN/métodos , Humanos , Discapacidad Intelectual/diagnóstico , Mutación , Linaje , Polimorfismo Genético
10.
Commun Biol ; 3(1): 78, 2020 02 18.
Artículo en Inglés | MEDLINE | ID: mdl-32071408

RESUMEN

Haplotype phasing maize genetic variants is important for genome interpretation, population genetic analysis and functional analysis of allelic activity. We performed an isoform-level phasing study using two maize inbred lines and their reciprocal crosses, based on single-molecule, full-length cDNA sequencing. To phase and analyze transcripts between hybrids and parents, we developed IsoPhase. Using this tool, we validated the majority of SNPs called against matching short-read data from embryo, endosperm and root tissues, and identified allele-specific, gene-level and isoform-level differential expression between the inbred parental lines and hybrid offspring. After phasing 6907 genes in the reciprocal hybrids, we annotated the SNPs and identified large-effect genes. In addition, we identified parent-of-origin isoforms, distinct novel isoforms in maize parent and hybrid lines, and imprinted genes from different tissues. Finally, we characterized variation in cis- and trans-regulatory effects. Our study provides measures of haplotypic expression that could increase accuracy in studies of allelic expression.


Asunto(s)
Análisis de Secuencia de ARN/métodos , Zea mays/genética , Alelos , Endospermo/genética , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica de las Plantas , Genes de Plantas , Genoma de Planta , Haplotipos , Mutación , Proteínas de Plantas/genética , Plantas Modificadas Genéticamente , ARN Mensajero/análisis , ARN Mensajero/genética , Zea mays/fisiología
11.
PLoS One ; 15(1): e0226340, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31940362

RESUMEN

Structural variation (SV) is typically defined as variation within the human genome that exceeds 50 base pairs (bp). SV may be copy number neutral or it may involve duplications, deletions, and complex rearrangements. Recent studies have shown SV to be associated with many human diseases. However, studies of SV have been challenging due to technological constraints. With the advent of third generation (long-read) sequencing technology, exploration of longer stretches of DNA not easily examined previously has been made possible. In the present study, we utilized third generation (long-read) sequencing techniques to examine SV in the EGFR landscape of four haplotypes derived from two human samples. We analyzed the EGFR gene and its landscape (+/- 500,000 base pairs) using this approach and were able to identify a region of non-coding DNA with over 90% similarity to the most common activating EGFR mutation in non-small cell lung cancer. Based on previously published Alu-element genome instability algorithms, we propose a molecular mechanism to explain how this non-coding region of DNA may be interacting with and impacting the stability of the EGFR gene and potentially generating this cancer-driver gene. By these techniques, we were also able to identify previously hidden structural variation in the four haplotypes and in the human reference genome (hg38). We applied previously published algorithms to compare the relative stabilities of these five different EGFR gene landscape haplotypes to estimate their relative potentials to generate the EGFR exon 19, 15 bp canonical deletion. To our knowledge, the present study is the first to use the differences in genomic architecture between targeted cancer-linked phased haplotypes to estimate their relative potentials to form a common cancer-linked driver mutation.


Asunto(s)
Genes erbB-1/genética , Variación Genética , Genoma Humano/genética , Inestabilidad Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Carcinoma de Pulmón de Células no Pequeñas/genética , Simulación por Computador , Haplotipos , Humanos , Neoplasias Pulmonares/genética , Análisis de Secuencia de ADN
12.
Genes (Basel) ; 10(1)2019 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-30669388

RESUMEN

A high-quality reference genome is a fundamental resource for functional genetics, comparative genomics, and population genomics, and is increasingly important for conservation biology. PacBio Single Molecule, Real-Time (SMRT) sequencing generates long reads with uniform coverage and high consensus accuracy, making it a powerful technology for de novo genome assembly. Improvements in throughput and concomitant reductions in cost have made PacBio an attractive core technology for many large genome initiatives, however, relatively high DNA input requirements (~5 µg for standard library protocol) have placed PacBio out of reach for many projects on small organisms that have lower DNA content, or on projects with limited input DNA for other reasons. Here we present a high-quality de novo genome assembly from a single Anopheles coluzzii mosquito. A modified SMRTbell library construction protocol without DNA shearing and size selection was used to generate a SMRTbell library from just 100 ng of starting genomic DNA. The sample was run on the Sequel System with chemistry 3.0 and software v6.0, generating, on average, 25 Gb of sequence per SMRT Cell with 20 h movies, followed by diploid de novo genome assembly with FALCON-Unzip. The resulting curated assembly had high contiguity (contig N50 3.5 Mb) and completeness (more than 98% of conserved genes were present and full-length). In addition, this single-insect assembly now places 667 (>90%) of formerly unplaced genes into their appropriate chromosomal contexts in the AgamP4 PEST reference. We were also able to resolve maternal and paternal haplotypes for over 1/3 of the genome. By sequencing and assembling material from a single diploid individual, only two haplotypes were present, simplifying the assembly process compared to samples from multiple pooled individuals. The method presented here can be applied to samples with starting DNA amounts as low as 100 ng per 1 Gb genome size. This new low-input approach puts PacBio-based assemblies in reach for small highly heterozygous organisms that comprise much of the diversity of life.


Asunto(s)
Anopheles/genética , Genoma de los Insectos , Análisis de Secuencia de ADN/métodos , Animales , Mapeo Contig/métodos , Mapeo Contig/normas , Ploidias , Polimorfismo Genético , Análisis de Secuencia de ADN/normas
13.
mSystems ; 4(1)2019.
Artículo en Inglés | MEDLINE | ID: mdl-30834329

RESUMEN

Extensive drug resistance (XDR) is an escalating global problem. Escherichia coli strain Sanji was isolated from an outbreak of pheasant colibacillosis in Fujian province, China, in 2011. This strain has XDR properties, exhibiting sensitivity to carbapenems but no other classes of known antibiotics. Whole-genome sequencing revealed a total of 32 known antibiotic resistance genes, many associated with insertion sequence 26 (IS26) elements. These were found on the Sanji chromosome and 2 of its 6 plasmids, pSJ_255 and pSJ_82. The Sanji chromosome also harbors a type 2 secretion system (T2SS), a type 3 secretion system (T3SS), a type 6 secretion system (T6SS), and several putative prophages. Sanji and other ST167 strains have a previously uncharacterized O-antigen (O89b) that is most closely related to serotype O89 as determined on the basis of analysis of the wzm-wzt genes and in silico serotyping. This O89b-antigen gene cluster was also found in the genomes of a few other pathogenic sequence type 617 (ST617) and ST10 complex strains. A time-scaled phylogeny inferred from comparative single nucleotide variant analysis indicated that development of these O89b-containing lineages emerged about 30 years ago. Comparative sequence analysis revealed that the core genome of Sanji is nearly identical to that of several recently sequenced strains of pathogenic XDR E. coli belonging to the ST167 group. Comparison of the mobile elements among the different ST167 genomes revealed that each genome carries a distinct set of multidrug resistance genes on different types of plasmids, indicating that there are multiple paths toward the emergence of XDR in E. coli. IMPORTANCE E. coli strain Sanji is the first sequenced and analyzed genome of the recently emerged pathogenic XDR strains with sequence type ST167 and novel in silico serotype O89b:H9. Comparison of the genomes of Sanji with other ST167 strains revealed distinct sets of different plasmids, mobile IS elements, and antibiotic resistance genes in each genome, indicating that there exist multiple paths toward achieving XDR. The emergence of these pathogenic ST167 E. coli strains with diverse XDR capabilities highlights the difficulty of preventing or mitigating the development of XDR properties in bacteria and points to the importance of better understanding of the shared underlying virulence mechanisms and physiology of pathogenic bacteria.

14.
Genes (Basel) ; 10(4)2019 03 27.
Artículo en Inglés | MEDLINE | ID: mdl-30934798

RESUMEN

Hematopoietic cells are continuously replenished from progenitor cells that reside in the bone marrow. To evaluate molecular changes during this process, we analyzed the transcriptomes of freshly harvested human bone marrow progenitor (lineage-negative) and differentiated (lineage-positive) cells by single-molecule real-time (SMRT) full-length RNA-sequencing. This analysis revealed a ~5-fold higher number of transcript isoforms than previously detected and showed a distinct composition of individual transcript isoforms characteristic for bone marrow subpopulations. A detailed analysis of messenger RNA (mRNA) isoforms transcribed from the ANXA1 and EEF1A1 loci confirmed their distinct composition. The expression of proteins predicted from the transcriptome analysis was evaluated by mass spectrometry and validated previously unknown protein isoforms predicted e.g., for EEF1A1. These protein isoforms distinguished the lineage negative cell population from the lineage positive cell population. Finally, transcript isoforms expressed from paralogous gene loci (e.g., CFD, GATA2, HLA-A, B, and C) also distinguished cell subpopulations but were only detectable by full-length RNA sequencing. Thus, qualitatively distinct transcript isoforms from individual genomic loci separate bone marrow cell subpopulations indicating complex transcriptional regulation and protein isoform generation during hematopoiesis.


Asunto(s)
Linaje de la Célula/genética , Secuenciación de Nucleótidos de Alto Rendimiento , ARN Mensajero/genética , Transcriptoma/genética , Empalme Alternativo/genética , Células de la Médula Ósea/metabolismo , Genómica/métodos , Humanos , Imagen Individual de Molécula/métodos , Secuenciación del Exoma/métodos
15.
Gigascience ; 8(10)2019 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-31609423

RESUMEN

BACKGROUND: A high-quality reference genome is an essential tool for applied and basic research on arthropods. Long-read sequencing technologies may be used to generate more complete and contiguous genome assemblies than alternate technologies; however, long-read methods have historically had greater input DNA requirements and higher costs than next-generation sequencing, which are barriers to their use on many samples. Here, we present a 2.3 Gb de novo genome assembly of a field-collected adult female spotted lanternfly (Lycorma delicatula) using a single Pacific Biosciences SMRT Cell. The spotted lanternfly is an invasive species recently discovered in the northeastern United States that threatens to damage economically important crop plants in the region. RESULTS: The DNA from 1 individual was used to make 1 standard, size-selected library with an average DNA fragment size of ∼20 kb. The library was run on 1 Sequel II SMRT Cell 8M, generating a total of 132 Gb of long-read sequences, of which 82 Gb were from unique library molecules, representing ∼36× coverage of the genome. The assembly had high contiguity (contig N50 length = 1.5 Mb), completeness, and sequence level accuracy as estimated by conserved gene set analysis (96.8% of conserved genes both complete and without frame shift errors). Furthermore, it was possible to segregate more than half of the diploid genome into the 2 separate haplotypes. The assembly also recovered 2 microbial symbiont genomes known to be associated with L. delicatula, each microbial genome being assembled into a single contig. CONCLUSIONS: We demonstrate that field-collected arthropods can be used for the rapid generation of high-quality genome assemblies, an attractive approach for projects on emerging invasive species, disease vectors, or conservation efforts of endangered species.


Asunto(s)
Dípteros/genética , Genoma de los Insectos , Genómica/métodos , Animales , Femenino , Biblioteca de Genes , Especies Introducidas , Análisis de Secuencia de ADN
16.
Genes (Basel) ; 9(8)2018 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-30071683

RESUMEN

Abstract: Genome-level data can provide researchers with unprecedented precision to examine the causes and genetic consequences of population declines, which can inform conservation management. Here, we present a high-quality, long-read, de novo genome assembly for one of the world's most endangered bird species, the 'Alala (Corvus hawaiiensis; Hawaiian crow). As the only remaining native crow species in Hawai'i, the 'Alala survived solely in a captive-breeding program from 2002 until 2016, at which point a long-term reintroduction program was initiated. The high-quality genome assembly was generated to lay the foundation for both comparative genomics studies and the development of population-level genomic tools that will aid conservation and recovery efforts. We illustrate how the quality of this assembly places it amongst the very best avian genomes assembled to date, comparable to intensively studied model systems. We describe the genome architecture in terms of repetitive elements and runs of homozygosity, and we show that compared with more outbred species, the 'Alala genome is substantially more homozygous. We also provide annotations for a subset of immunity genes that are likely to be important in conservation management, and we discuss how this genome is currently being used as a roadmap for downstream conservation applications.

17.
Front Immunol ; 9: 2294, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30337930

RESUMEN

Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2%. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.


Asunto(s)
Biología Computacional/métodos , Genes MHC Clase II , Genes MHC Clase I , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Programas Informáticos , Adulto , Anciano , Anciano de 80 o más Años , Alelos , Artritis Reumatoide/genética , Femenino , Frecuencia de los Genes , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Genómica/métodos , Genotipo , Técnicas de Genotipaje , Humanos , Masculino , Persona de Mediana Edad , Filogenia , Polimorfismo Genético
18.
DNA Res ; 23(4): 339-51, 2016 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-27345719

RESUMEN

The application of next-generation sequencing to estimate genetic diversity of Plasmodium falciparum, the most lethal malaria parasite, has proved challenging due to the skewed AT-richness [∼80.6% (A + T)] of its genome and the lack of technology to assemble highly polymorphic subtelomeric regions that contain clonally variant, multigene virulence families (Ex: var and rifin). To address this, we performed amplification-free, single molecule, real-time sequencing of P. falciparum genomic DNA and generated reads of average length 12 kb, with 50% of the reads between 15.5 and 50 kb in length. Next, using the Hierarchical Genome Assembly Process, we assembled the P. falciparum genome de novo and successfully compiled all 14 nuclear chromosomes telomere-to-telomere. We also accurately resolved centromeres [∼90-99% (A + T)] and subtelomeric regions and identified large insertions and duplications that add extra var and rifin genes to the genome, along with smaller structural variants such as homopolymer tract expansions. Overall, we show that amplification-free, long-read sequencing combined with de novo assembly overcomes major challenges inherent to studying the P. falciparum genome. Indeed, this technology may not only identify the polymorphic and repetitive subtelomeric sequences of parasite populations from endemic areas but may also evaluate structural variation linked to virulence, drug resistance and disease transmission.


Asunto(s)
Genoma de Protozoos , Plasmodium falciparum/genética , Telómero/genética , Mapeo Contig , Polimorfismo Genético , Análisis de Secuencia de ADN
20.
Genome Announc ; 3(2)2015 Mar 26.
Artículo en Inglés | MEDLINE | ID: mdl-25814591

RESUMEN

Clostridium difficile is one of the leading causes of antibiotic-associated diarrhea in health care facilities worldwide. Here, we report the genome sequence of C. difficile strain G46, ribotype 027, isolated from an outbreak in Glamorgan, Wales, in 2006.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA