Pesquisa | Portal de Pesquisa da BVS

1.

Major Impacts of Widespread Structural Variation on Gene Expression and Crop Improvement in Tomato.

Alonge, Michael; Wang, Xingang; Benoit, Matthias; Soyk, Sebastian; Pereira, Lara; Zhang, Lei; Suresh, Hamsini; Ramakrishnan, Srividya; Maumus, Florian; Ciren, Danielle; Levy, Yuval; Harel, Tom Hai; Shalev-Schlosser, Gili; Amsellem, Ziva; Razifard, Hamid; Caicedo, Ana L; Tieman, Denise M; Klee, Harry; Kirsche, Melanie; Aganezov, Sergey; Ranallo-Benavidez, T Rhyker; Lemmon, Zachary H; Kim, Jennifer; Robitaille, Gina; Kramer, Melissa; Goodwin, Sara; McCombie, W Richard; Hutton, Samuel; Van Eck, Joyce; Gillis, Jesse; Eshed, Yuval; Sedlazeck, Fritz J; van der Knaap, Esther; Schatz, Michael C; Lippman, Zachary B.

Cell ; 182(1): 145-161.e23, 2020 07 09.

Artigo em Inglês | MEDLINE | ID: mdl-32553272

RESUMO

Structural variants (SVs) underlie important crop improvement and domestication traits. However, resolving the extent, diversity, and quantitative impact of SVs has been challenging. We used long-read nanopore sequencing to capture 238,490 SVs in 100 diverse tomato lines. This panSV genome, along with 14 new reference assemblies, revealed large-scale intermixing of diverse genotypes, as well as thousands of SVs intersecting genes and cis-regulatory regions. Hundreds of SV-gene pairs exhibit subtle and significant expression changes, which could broadly influence quantitative trait variation. By combining quantitative genetics with genome editing, we show how multiple SVs that changed gene dosage and expression levels modified fruit flavor, size, and production. In the last example, higher order epistasis among four SVs affecting three related transcription factors allowed introduction of an important harvesting trait in modern tomato. Our findings highlight the underexplored role of SVs in genotype-to-phenotype relationships and their widespread importance and utility in crop improvement.

Assuntos

Produtos Agrícolas/genética , Regulação da Expressão Gênica de Plantas , Variação Estrutural do Genoma , Solanum lycopersicum/genética , Alelos , Sistema Enzimático do Citocromo P-450/genética , Ecótipo , Epistasia Genética , Frutas/genética , Duplicação Gênica , Genoma de Planta , Genótipo , Endogamia , Anotação de Sequência Molecular , Fenótipo , Melhoramento Vegetal , Locos de Características Quantitativas/genética

2.

Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References.

Taylor, Dylan J; Eizenga, Jordan M; Li, Qiuhui; Das, Arun; Jenike, Katharine M; Kenny, Eimear E; Miga, Karen H; Monlong, Jean; McCoy, Rajiv C; Paten, Benedict; Schatz, Michael C.

Annu Rev Genomics Hum Genet ; 2024 Apr 25.

Artigo em Inglês | MEDLINE | ID: mdl-38663087

RESUMO

The Human Genome Project was an enormous accomplishment, providing a foundation for countless explorations into the genetics and genomics of the human species. Yet for many years, the human genome reference sequence remained incomplete and lacked representation of human genetic diversity. Recently, two major advances have emerged to address these shortcomings: complete gap-free human genome sequences, such as the one developed by the Telomere-to-Telomere Consortium, and high-quality pangenomes, such as the one developed by the Human Pangenome Reference Consortium. Facilitated by advances in long-read DNA sequencing and genome assembly algorithms, complete human genome sequences resolve regions that have been historically difficult to sequence, including centromeres, telomeres, and segmental duplications. In parallel, pangenomes capture the extensive genetic diversity across populations worldwide. Together, these advances usher in a new era of genomics research, enhancing the accuracy of genomic analysis, paving the path for precision medicine, and contributing to deeper insights into human biology.

3.

Interspecies regulatory landscapes and elements revealed by novel joint systematic integration of human and mouse blood cell epigenomes.

Xiang, Guanjue; He, Xi; Giardine, Belinda M; Isaac, Kathryn J; Taylor, Dylan J; McCoy, Rajiv C; Jansen, Camden; Keller, Cheryl A; Wixom, Alexander Q; Cockburn, April; Miller, Amber; Qi, Qian; He, Yanghua; Li, Yichao; Lichtenberg, Jens; Heuston, Elisabeth F; Anderson, Stacie M; Luan, Jing; Vermunt, Marit W; Yue, Feng; Sauria, Michael E G; Schatz, Michael C; Taylor, James; Göttgens, Berthold; Hughes, Jim R; Higgs, Douglas R; Weiss, Mitchell J; Cheng, Yong; Blobel, Gerd A; Bodine, David M; Zhang, Yu; Li, Qunhua; Mahony, Shaun; Hardison, Ross C.

Genome Res ; 2024 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-38951027

RESUMO

Knowledge of locations and activities of cis-regulatory elements (CREs) is needed to decipher basic mechanisms of gene regulation and to understand the impact of genetic variants on complex traits. Previous studies identified candidate CREs (cCREs) using epigenetic features in one species, making comparisons difficult between species. In contrast, we conducted an interspecies study defining epigenetic states and identifying cCREs in blood cell types to generate regulatory maps that are comparable between species, using integrative modeling of eight epigenetic features jointly in human and mouse in our Validated Systematic Integration (VISION) Project. The resulting catalogs of cCREs are useful resources for further studies of gene regulation in blood cells, indicated by high overlap with known functional elements and strong enrichment for human genetic variants associated with blood cell phenotypes. The contribution of each epigenetic state in cCREs to gene regulation, inferred from a multivariate regression, was used to estimate epigenetic state Regulatory Potential (esRP) scores for each cCRE in each cell type, which were used to categorize dynamic changes in cCREs. Groups of cCREs displaying similar patterns of regulatory activity in human and mouse cell types, obtained by joint clustering on esRP scores, harbored distinctive transcription factor binding motifs that were similar between species. An interspecies comparison of cCREs revealed both conserved and species-specific patterns of epigenetic evolution. Finally, we showed that comparisons of the epigenetic landscape between species can reveal elements with similar roles in regulation, even in the absence of genomic sequence alignment.

4.

Jasmine and Iris: population-scale structural variant comparison and analysis.

Kirsche, Melanie; Prabhu, Gautam; Sherman, Rachel; Ni, Bohan; Battle, Alexis; Aganezov, Sergey; Schatz, Michael C.

Nat Methods ; 20(3): 408-417, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36658279

RESUMO

The availability of long reads is revolutionizing studies of structural variants (SVs). However, because SVs vary across individuals and are discovered through imprecise read technologies and methods, they can be difficult to compare. Addressing this, we present Jasmine and Iris ( https://github.com/mkirsche/Jasmine/ ), for fast and accurate SV refinement, comparison and population analysis. Using an SV proximity graph, Jasmine outperforms six widely used comparison methods, including reducing the rate of Mendelian discordance in trio datasets by more than fivefold, and reveals a set of high-confidence de novo SVs confirmed by multiple technologies. We also present a unified callset of 122,813 SVs and 82,379 indels from 31 samples of diverse ancestry sequenced with long reads. We genotype these variants in 1,317 samples from the 1000 Genomes Project and the Genotype-Tissue Expression project with DNA and RNA-sequencing data and assess their widespread impact on gene expression, including within medically relevant genes.

Assuntos

Jasminum , Humanos , Genoma , Análise de Sequência , Genótipo , Iris , Análise de Sequência de DNA/métodos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software

5.

Establishing Physalis as a Solanaceae model system enables genetic reevaluation of the inflated calyx syndrome.

He, Jia; Alonge, Michael; Ramakrishnan, Srividya; Benoit, Matthias; Soyk, Sebastian; Reem, Nathan T; Hendelman, Anat; Van Eck, Joyce; Schatz, Michael C; Lippman, Zachary B.

Plant Cell ; 35(1): 351-368, 2023 01 02.

Artigo em Inglês | MEDLINE | ID: mdl-36268892

RESUMO

The highly diverse Solanaceae family contains several widely studied models and crop species. Fully exploring, appreciating, and exploiting this diversity requires additional model systems. Particularly promising are orphan fruit crops in the genus Physalis, which occupy a key evolutionary position in the Solanaceae and capture understudied variation in traits such as inflorescence complexity, fruit ripening and metabolites, disease and insect resistance, self-compatibility, and most notable, the striking inflated calyx syndrome (ICS), an evolutionary novelty found across angiosperms where sepals grow exceptionally large to encapsulate fruits in a protective husk. We recently developed transformation and genome editing in Physalis grisea (groundcherry). However, to systematically explore and unlock the potential of this and related Physalis as genetic systems, high-quality genome assemblies are needed. Here, we present chromosome-scale references for P. grisea and its close relative Physalis pruinosa and use these resources to study natural and engineered variations in floral traits. We first rapidly identified a natural structural variant in a bHLH gene that causes petal color variation. Further, and against expectations, we found that CRISPR-Cas9-targeted mutagenesis of 11 MADS-box genes, including purported essential regulators of ICS, had no effect on inflation. In a forward genetics screen, we identified huskless, which lacks ICS due to mutation of an AP2-like gene that causes sepals and petals to merge into a single whorl of mixed identity. These resources and findings elevate Physalis to a new Solanaceae model system and establish a paradigm in the search for factors driving ICS.

Assuntos

Physalis , Solanaceae , Solanaceae/genética , Physalis/genética , Physalis/metabolismo , Evolução Biológica , Mutação , Edição de Genes

6.

Hypo-osmotic-like stress underlies general cellular defects of aneuploidy.

Tsai, Hung-Ji; Nelliat, Anjali R; Choudhury, Mohammad Ikbal; Kucharavy, Andrei; Bradford, William D; Cook, Malcolm E; Kim, Jisoo; Mair, Devin B; Sun, Sean X; Schatz, Michael C; Li, Rong.

Nature ; 570(7759): 117-121, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-31068692

RESUMO

Aneuploidy, which refers to unbalanced chromosome numbers, represents a class of genetic variation that is associated with cancer, birth defects and eukaryotic micro-organisms1-4. Whereas it is known that each aneuploid chromosome stoichiometry can give rise to a distinct pattern of gene expression and phenotypic profile4,5, it remains a fundamental question as to whether there are common cellular defects that are associated with aneuploidy. Here we show the existence in budding yeast of a common aneuploidy gene-expression signature that is suggestive of hypo-osmotic stress, using a strategy that enables the observation of common transcriptome changes of aneuploidy by averaging out karyotype-specific dosage effects in aneuploid yeast-cell populations with random and diverse chromosome stoichiometry. Consistently, aneuploid yeast exhibited increased plasma-membrane stress that led to impaired endocytosis, and this defect was also observed in aneuploid human cells. Thermodynamic modelling showed that hypo-osmotic-like stress is a general outcome of the proteome imbalance that is caused by aneuploidy, and also predicted a relationship between ploidy and cell size that was observed in yeast and aneuploid cancer cells. A genome-wide screen uncovered a general dependency of aneuploid cells on a pathway of ubiquitin-mediated endocytic recycling of nutrient transporters. Loss of this pathway, coupled with the endocytic defect inherent to aneuploidy, leads to a marked alteration of intracellular nutrient homeostasis.

Assuntos

Aneuploidia , Pressão Osmótica , Proteoma/genética , Proteoma/metabolismo , Saccharomyces cerevisiae/citologia , Saccharomyces cerevisiae/genética , Estresse Fisiológico , Membrana Celular/metabolismo , Membrana Celular/patologia , Proteínas de Ligação a DNA/metabolismo , Endocitose , Complexos Endossomais de Distribuição Requeridos para Transporte/metabolismo , Homeostase , Humanos , Cariótipo , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Termodinâmica , Fatores de Transcrição/metabolismo , Transcriptoma/genética , Ubiquitina/metabolismo , Complexos Ubiquitina-Proteína Ligase/metabolismo

7.

Optimized sample selection for cost-efficient long-read population sequencing.

Ranallo-Benavidez, T Rhyker; Lemmon, Zachary; Soyk, Sebastian; Aganezov, Sergey; Salerno, William J; McCoy, Rajiv C; Lippman, Zachary B; Schatz, Michael C; Sedlazeck, Fritz J.

Genome Res ; 31(5): 910-918, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-33811084

RESUMO

An increasingly important scenario in population genetics is when a large cohort has been genotyped using a low-resolution approach (e.g., microarrays, exome capture, short-read WGS), from which a few individuals are resequenced using a more comprehensive approach, especially long-read sequencing. The subset of individuals selected should ensure that the captured genetic diversity is fully representative and includes variants across all subpopulations. For example, human variation has historically focused on individuals with European ancestry, but this represents a small fraction of the overall diversity. Addressing this, SVCollector identifies the optimal subset of individuals for resequencing by analyzing population-level VCF files from low-resolution genotyping studies. It then computes a ranked list of samples that maximizes the total number of variants present within a subset of a given size. To solve this optimization problem, SVCollector implements a fast, greedy heuristic and an exact algorithm using integer linear programming. We apply SVCollector on simulated data, 2504 human genomes from the 1000 Genomes Project, and 3024 genomes from the 3000 Rice Genomes Project and show the rankings it computes are more representative than alternative naive strategies. When selecting an optimal subset of 100 samples in these cohorts, SVCollector identifies individuals from every subpopulation, whereas naive methods yield an unbalanced selection. Finally, we show the number of variants present in cohorts selected using this approach follows a power-law distribution that is naturally related to the population genetic concept of the allele frequency spectrum, allowing us to estimate the diversity present with increasing numbers of samples.

Assuntos

Genoma Humano , Polimorfismo de Nucleotídeo Único , Exoma/genética , Frequência do Gene , Genética Populacional , Humanos , Análise de Sequência de DNA/métodos

8.

The genomic basis of evolutionary differentiation among honey bees.

Fouks, Bertrand; Brand, Philipp; Nguyen, Hung N; Herman, Jacob; Camara, Francisco; Ence, Daniel; Hagen, Darren E; Hoff, Katharina J; Nachweide, Stefanie; Romoth, Lars; Walden, Kimberly K O; Guigo, Roderic; Stanke, Mario; Narzisi, Giuseppe; Yandell, Mark; Robertson, Hugh M; Koeniger, Nikolaus; Chantawannakul, Panuwan; Schatz, Michael C; Worley, Kim C; Robinson, Gene E; Elsik, Christine G; Rueppell, Olav.

Genome Res ; 31(7): 1203-1215, 2021 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-33947700

RESUMO

In contrast to the western honey bee, Apis mellifera, other honey bee species have been largely neglected despite their importance and diversity. The genetic basis of the evolutionary diversification of honey bees remains largely unknown. Here, we provide a genome-wide comparison of three honey bee species, each representing one of the three subgenera of honey bees, namely the dwarf (Apis florea), giant (A. dorsata), and cavity-nesting (A. mellifera) honey bees with bumblebees as an outgroup. Our analyses resolve the phylogeny of honey bees with the dwarf honey bees diverging first. We find that evolution of increased eusocial complexity in Apis proceeds via increases in the complexity of gene regulation, which is in agreement with previous studies. However, this process seems to be related to pathways other than transcriptional control. Positive selection patterns across Apis reveal a trade-off between maintaining genome stability and generating genetic diversity, with a rapidly evolving piRNA pathway leading to genomes depleted of transposable elements, and a rapidly evolving DNA repair pathway associated with high recombination rates in all Apis species. Diversification within Apis is accompanied by positive selection in several genes whose putative functions present candidate mechanisms for lineage-specific adaptations, such as migration, immunity, and nesting behavior.

9.

Piercing the dark matter: bioinformatics of long-range sequencing and mapping.

Sedlazeck, Fritz J; Lee, Hayan; Darby, Charlotte A; Schatz, Michael C.

Nat Rev Genet ; 19(6): 329-346, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-29599501

RESUMO

Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.

Assuntos

Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Transcriptoma , Animais , Humanos

10.

Fast and accurate genome-wide predictions and structural modeling of protein-protein interactions using Galaxy.

Guerler, Aysam; Baker, Dannon; van den Beek, Marius; Gruening, Bjoern; Bouvier, Dave; Coraor, Nate; Shank, Stephen D; Zehr, Jordan D; Schatz, Michael C; Nekrutenko, Anton.

BMC Bioinformatics ; 24(1): 263, 2023 Jun 23.

Artigo em Inglês | MEDLINE | ID: mdl-37353753

RESUMO

BACKGROUND: Protein-protein interactions play a crucial role in almost all cellular processes. Identifying interacting proteins reveals insight into living organisms and yields novel drug targets for disease treatment. Here, we present a publicly available, automated pipeline to predict genome-wide protein-protein interactions and produce high-quality multimeric structural models. RESULTS: Application of our method to the Human and Yeast genomes yield protein-protein interaction networks similar in quality to common experimental methods. We identified and modeled Human proteins likely to interact with the papain-like protease of SARS-CoV2's non-structural protein 3. We also produced models of SARS-CoV2's spike protein (S) interacting with myelin-oligodendrocyte glycoprotein receptor and dipeptidyl peptidase-4. CONCLUSIONS: The presented method is capable of confidently identifying interactions while providing high-quality multimeric structural models for experimental validation. The interactome modeling pipeline is available at usegalaxy.org and usegalaxy.eu.

Assuntos

COVID-19 , Mapeamento de Interação de Proteínas , Humanos , RNA Viral/metabolismo , SARS-CoV-2 , Saccharomyces cerevisiae/metabolismo

11.

Comprehensive analysis of structural variants in breast cancer genomes using single-molecule sequencing.

Aganezov, Sergey; Goodwin, Sara; Sherman, Rachel M; Sedlazeck, Fritz J; Arun, Gayatri; Bhatia, Sonam; Lee, Isac; Kirsche, Melanie; Wappel, Robert; Kramer, Melissa; Kostroff, Karen; Spector, David L; Timp, Winston; McCombie, W Richard; Schatz, Michael C.

Genome Res ; 30(9): 1258-1273, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32887686

RESUMO

Improved identification of structural variants (SVs) in cancer can lead to more targeted and effective treatment options as well as advance our basic understanding of the disease and its progression. We performed whole-genome sequencing of the SKBR3 breast cancer cell line and patient-derived tumor and normal organoids from two breast cancer patients using Illumina/10x Genomics, Pacific Biosciences (PacBio), and Oxford Nanopore Technologies (ONT) sequencing. We then inferred SVs and large-scale allele-specific copy number variants (CNVs) using an ensemble of methods. Our findings show that long-read sequencing allows for substantially more accurate and sensitive SV detection, with between 90% and 95% of variants supported by each long-read technology also supported by the other. We also report high accuracy for long reads even at relatively low coverage (25×-30×). Furthermore, we integrated SV and CNV data into a unifying karyotype-graph structure to present a more accurate representation of the mutated cancer genomes. We find hundreds of variants within known cancer-related genes detectable only through long-read sequencing. These findings highlight the need for long-read sequencing of cancer genomes for the precise analysis of their genetic instability.

Assuntos

Neoplasias da Mama/genética , Variação Estrutural do Genoma , Sequenciamento Completo do Genoma/métodos , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Metilação de DNA , DNA de Neoplasias , Feminino , Humanos , Nanoporos , Organoides , RNA-Seq

12.

A plasmid locus associated with Klebsiella clinical infections encodes a microbiome-dependent gut fitness factor.

Vornhagen, Jay; Bassis, Christine M; Ramakrishnan, Srividya; Hein, Robert; Mason, Sophia; Bergman, Yehudit; Sunshine, Nicole; Fan, Yunfan; Holmes, Caitlyn L; Timp, Winston; Schatz, Michael C; Young, Vincent B; Simner, Patricia J; Bachman, Michael A.

PLoS Pathog ; 17(4): e1009537, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33930099

RESUMO

Klebsiella pneumoniae (Kp) is an important cause of healthcare-associated infections, which increases patient morbidity, mortality, and hospitalization costs. Gut colonization by Kp is consistently associated with subsequent Kp disease, and patients are predominantly infected with their colonizing strain. Our previous comparative genomics study, between disease-causing and asymptomatically colonizing Kp isolates, identified a plasmid-encoded tellurite (TeO3-2)-resistance (ter) operon as strongly associated with infection. However, TeO3-2 is extremely rare and toxic to humans. Thus, we used a multidisciplinary approach to determine the biological link between ter and Kp infection. First, we used a genomic and bioinformatic approach to extensively characterize Kp plasmids encoding the ter locus. These plasmids displayed substantial variation in plasmid incompatibility type and gene content. Moreover, the ter operon was genetically independent of other plasmid-encoded virulence and antibiotic resistance loci, both in our original patient cohort and in a large set (n = 88) of publicly available ter operon-encoding Kp plasmids, indicating that the ter operon is likely playing a direct, but yet undescribed role in Kp disease. Next, we employed multiple mouse models of infection and colonization to show that 1) the ter operon is dispensable during bacteremia, 2) the ter operon enhances fitness in the gut, 3) this phenotype is dependent on the colony of origin of mice, and 4) antibiotic disruption of the gut microbiota eliminates the requirement for ter. Furthermore, using 16S rRNA gene sequencing, we show that the ter operon enhances Kp fitness in the gut in the presence of specific indigenous microbiota, including those predicted to produce short chain fatty acids. Finally, administration of exogenous short-chain fatty acids in our mouse model of colonization was sufficient to reduce fitness of a ter mutant. These findings indicate that the ter operon, strongly associated with human infection, encodes factors that resist stress induced by the indigenous gut microbiota during colonization. This work represents a substantial advancement in our molecular understanding of Kp pathogenesis and gut colonization, directly relevant to Kp disease in healthcare settings.

Assuntos

Microbioma Gastrointestinal/genética , Intestinos/microbiologia , Klebsiella/genética , Plasmídeos/genética , Animais , Bacteriemia/genética , Proteínas de Bactérias/genética , Feminino , Aptidão Genética/fisiologia , Loci Gênicos/fisiologia , Genoma Bacteriano , Interações Hospedeiro-Patógeno/genética , Resistência a Canamicina/genética , Infecções por Klebsiella/microbiologia , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Óperon/genética , Especificidade de Órgãos/genética , Virulência/genética , beta-Lactamases/genética

13.

High resolution copy number inference in cancer using short-molecule nanopore sequencing.

Baslan, Timour; Kovaka, Sam; Sedlazeck, Fritz J; Zhang, Yanming; Wappel, Robert; Tian, Sha; Lowe, Scott W; Goodwin, Sara; Schatz, Michael C.

Nucleic Acids Res ; 49(21): e124, 2021 12 02.

Artigo em Inglês | MEDLINE | ID: mdl-34551429

RESUMO

Genome copy number is an important source of genetic variation in health and disease. In cancer, Copy Number Alterations (CNAs) can be inferred from short-read sequencing data, enabling genomics-based precision oncology. Emerging Nanopore sequencing technologies offer the potential for broader clinical utility, for example in smaller hospitals, due to lower instrument cost, higher portability, and ease of use. Nonetheless, Nanopore sequencing devices are limited in the number of retrievable sequencing reads/molecules compared to short-read sequencing platforms, limiting CNA inference accuracy. To address this limitation, we targeted the sequencing of short-length DNA molecules loaded at optimized concentration in an effort to increase sequence read/molecule yield from a single nanopore run. We show that short-molecule nanopore sequencing reproducibly returns high read counts and allows high quality CNA inference. We demonstrate the clinical relevance of this approach by accurately inferring CNAs in acute myeloid leukemia samples. The data shows that, compared to traditional approaches such as chromosome analysis/cytogenetics, short molecule nanopore sequencing returns more sensitive, accurate copy number information in a cost effective and expeditious manner, including for multiplex samples. Our results provide a framework for short-molecule nanopore sequencing with applications in research and medicine, which includes but is not limited to, CNAs.

Assuntos

Variações do Número de Cópias de DNA , DNA/análise , Oncologia/métodos , Sequenciamento por Nanoporos/métodos , Neoplasias/genética , Linhagem Celular Tumoral , Humanos

14.

Sketching and sampling approaches for fast and accurate long read classification.

Das, Arun; Schatz, Michael C.

BMC Bioinformatics ; 23(1): 452, 2022 Oct 31.

Artigo em Inglês | MEDLINE | ID: mdl-36316646

RESUMO

BACKGROUND: In modern sequencing experiments, quickly and accurately identifying the sources of the reads is a crucial need. In metagenomics, where each read comes from one of potentially many members of a community, it can be important to identify the exact species the read is from. In other settings, it is important to distinguish which reads are from the targeted sample and which are from potential contaminants. In both cases, identification of the correct source of a read enables further investigation of relevant reads, while minimizing wasted work. This task is particularly challenging for long reads, which can have a substantial error rate that obscures the origins of each read. RESULTS: Existing tools for the read classification problem are often alignment or index-based, but such methods can have large time and/or space overheads. In this work, we investigate the effectiveness of several sampling and sketching-based approaches for read classification. In these approaches, a chosen sampling or sketching algorithm is used to generate a reduced representation (a "screen") of potential source genomes for a query readset before reads are streamed in and compared against this screen. Using a query read's similarity to the elements of the screen, the methods predict the source of the read. Such an approach requires limited pre-processing, stores and works with only a subset of the input data, and is able to perform classification with a high degree of accuracy. CONCLUSIONS: The sampling and sketching approaches investigated include uniform sampling, methods based on MinHash and its weighted and order variants, a minimizer-based technique, and a novel clustering-based sketching approach. We demonstrate the effectiveness of these techniques both in identifying the source microbial genomes for reads from a metagenomic long read sequencing experiment, and in distinguishing between long reads from organisms of interest and potential contaminant reads. We then compare these approaches to existing alignment, index and sketching-based tools for read classification, and demonstrate how such a method is a viable alternative for determining the source of query reads. Finally, we present a reference implementation of these approaches at https://github.com/arun96/sketching .

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Software , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Metagenoma , Algoritmos

15.

Cell wall protein variation, break-induced replication, and subtelomere dynamics in Candida glabrata.

Xu, Zhuwei; Green, Brian; Benoit, Nicole; Sobel, Jack D; Schatz, Michael C; Wheelan, Sarah; Cormack, Brendan P.

Mol Microbiol ; 116(1): 260-276, 2021 07.

Artigo em Inglês | MEDLINE | ID: mdl-33713372

RESUMO

Candida glabrata is an opportunistic pathogen of humans, responsible for up to 30% of disseminated candidiasis. Adherence of C. glabrata to host cells is mediated by adhesin-like proteins (ALPs), about half of which are encoded in the subtelomeres. We performed a de novo assembly of two C. glabrata strains, BG2 and BG3993, using long single-molecule real-time (SMRT) reads, and constructed high-quality telomere-to-telomere assemblies of all 13 chromosomes to assess differences between C. glabrata strains. We documented variation between strains, and in agreement with earlier studies, found high (~0.5%-1%) frequencies of SNVs across the genome, including within subtelomeric regions. We documented changes in ALP gene structure and complement: there are large length differences in ALP genes in different strains, resulting from copy number variation in tandem repeats. We compared strains to characterize chromosome rearrangement events including within the poorly characterized subtelomeric regions. We show that rearrangements within the subtelomere regions all affect ALP-encoding genes, and 14/16 involve just the most terminal ALP gene. We present evidence that these rearrangements are mediated by break-induced replication. This study highlights the constrained nature of subtelomeric changes impacting ALP gene complement and subtelomere structure.

Assuntos

Candida glabrata/genética , Moléculas de Adesão Celular/genética , Telômero/genética , Candidíase/microbiologia , Adesão Celular/fisiologia , Regulação Fúngica da Expressão Gênica/genética , Genoma Fúngico/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Recombinação Genética/genética

16.

Sapling: accelerating suffix array queries with learned data models.

Kirsche, Melanie; Das, Arun; Schatz, Michael C.

Bioinformatics ; 37(6): 744-749, 2021 05 05.

Artigo em Inglês | MEDLINE | ID: mdl-33107913

RESUMO

MOTIVATION: As genomic data becomes more abundant, efficient algorithms and data structures for sequence alignment become increasingly important. The suffix array is a widely used data structure to accelerate alignment, but the binary search algorithm used to query, it requires widespread memory accesses, causing a large number of cache misses on large datasets. RESULTS: Here, we present Sapling, an algorithm for sequence alignment, which uses a learned data model to augment the suffix array and enable faster queries. We investigate different types of data models, providing an analysis of different neural network models as well as providing an open-source aligner with a compact, practical piecewise linear model. We show that Sapling outperforms both an optimized binary search approach and multiple widely used read aligners on a diverse collection of genomes, including human, bacteria and plants, speeding up the algorithm by more than a factor of two while adding <1% to the suffix array's memory footprint. AVAILABILITY AND IMPLEMENTATION: The source code and tutorial are available open-source at https://github.com/mkirsche/sapling. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica , Software , Algoritmos , Humanos , Alinhamento de Sequência , Análise de Sequência de DNA

17.

Ribbon: intuitive visualization for complex genomic variation.

Nattestad, Maria; Aboukhalil, Robert; Chin, Chen-Shan; Schatz, Michael C.

Bioinformatics ; 37(3): 413-415, 2021 04 20.

Artigo em Inglês | MEDLINE | ID: mdl-32766814

RESUMO

SUMMARY: Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons. AVAILABILITY AND IMPLEMENTATION: Ribbon is freely available online at http://genomeribbon.com/ and is open-source at https://github.com/marianattestad/ribbon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica , Software , Genoma

18.

Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line.

Nattestad, Maria; Goodwin, Sara; Ng, Karen; Baslan, Timour; Sedlazeck, Fritz J; Rescheneder, Philipp; Garvin, Tyler; Fang, Han; Gurtowski, James; Hutton, Elizabeth; Tseng, Elizabeth; Chin, Chen-Shan; Beck, Timothy; Sundaravadanam, Yogi; Kramer, Melissa; Antoniou, Eric; McPherson, John D; Hicks, James; McCombie, W Richard; Schatz, Michael C.

Genome Res ; 28(8): 1126-1135, 2018 08.

Artigo em Inglês | MEDLINE | ID: mdl-29954844

RESUMO

The SK-BR-3 cell line is one of the most important models for HER2+ breast cancers, which affect one in five breast cancer patients. SK-BR-3 is known to be highly rearranged, although much of the variation is in complex and repetitive regions that may be underreported. Addressing this, we sequenced SK-BR-3 using long-read single molecule sequencing from Pacific Biosciences and develop one of the most detailed maps of structural variations (SVs) in a cancer genome available, with nearly 20,000 variants present, most of which were missed by short-read sequencing. Surrounding the important ERBB2 oncogene (also known as HER2), we discover a complex sequence of nested duplications and translocations, suggesting a punctuated progression. Full-length transcriptome sequencing further revealed several novel gene fusions within the nested genomic variants. Combining long-read genome and transcriptome sequencing enables an in-depth analysis of how SVs disrupt the genome and sheds new light on the complex mechanisms involved in cancer genome evolution.

Assuntos

Neoplasias da Mama/genética , Amplificação de Genes/genética , Rearranjo Gênico/genética , Oncogenes/genética , Neoplasias da Mama/patologia , Feminino , Genoma Humano , Variação Estrutural do Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Células MCF-7 , Receptor ErbB-2/genética , Sequências Repetitivas de Ácido Nucleico/genética , Transcriptoma/genética

19.

Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing.

Kovaka, Sam; Ou, Shujun; Jenike, Katharine M; Schatz, Michael C.

Nat Methods ; 20(1): 12-16, 2023 01.

Artigo em Inglês | MEDLINE | ID: mdl-36635537

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala , Transcriptoma , Análise de Sequência de DNA

20.

Accurate detection of complex structural variations using single-molecule sequencing.

Sedlazeck, Fritz J; Rescheneder, Philipp; Smolka, Moritz; Fang, Han; Nattestad, Maria; von Haeseler, Arndt; Schatz, Michael C.

Nat Methods ; 15(6): 461-468, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-29713083

RESUMO

Structural variations are the greatest source of genetic variation, but they remain poorly understood because of technological limitations. Single-molecule long-read sequencing has the potential to dramatically advance the field, although high error rates are a challenge with existing methods. Addressing this need, we introduce open-source methods for long-read alignment (NGMLR; https://github.com/philres/ngmlr ) and structural variant identification (Sniffles; https://github.com/fritzsedlazeck/Sniffles ) that provide unprecedented sensitivity and precision for variant detection, even in repeat-rich regions and for complex nested events that can have substantial effects on human health. In several long-read datasets, including healthy and cancerous human genomes, we discovered thousands of novel variants and categorized systematic errors in short-read approaches. NGMLR and Sniffles can automatically filter false events and operate on low-coverage data, thereby reducing the high costs that have hindered the application of long reads in clinical and research settings.

Assuntos

Análise Mutacional de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Genoma Humano , Genômica/métodos , Humanos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA