Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Nature ; 608(7922): 353-359, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35922509

RESUMO

Regulation of transcript structure generates transcript diversity and plays an important role in human disease1-7. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure8-16. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource. We identified just over 70,000 novel transcripts for annotated genes, and validated the protein expression of 10% of novel transcripts. We developed a new computational package, LORALS, to analyse the genetic effects of rare and common variants on the transcriptome by allele-specific analysis of long reads. We characterized allele-specific expression and transcript structure events, providing new insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb the transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we used this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.


Assuntos
Alelos , Perfilação da Expressão Gênica , Especificidade de Órgãos , RNA-Seq , Transcriptoma , Processamento Alternativo/genética , Linhagem Celular , Conjuntos de Dados como Assunto , Genótipo , Ribonucleoproteínas Nucleares Heterogêneas/deficiência , Ribonucleoproteínas Nucleares Heterogêneas/genética , Humanos , Especificidade de Órgãos/genética , Proteína de Ligação a Regiões Ricas em Polipirimidinas/deficiência , Proteína de Ligação a Regiões Ricas em Polipirimidinas/genética , Reprodutibilidade dos Testes , Transcriptoma/genética
2.
Nature ; 571(7765): 355-360, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31270458

RESUMO

Defining the transcriptomic identity of malignant cells is challenging in the absence of surface markers that distinguish cancer clones from one another, or from admixed non-neoplastic cells. To address this challenge, here we developed Genotyping of Transcriptomes (GoT), a method to integrate genotyping with high-throughput droplet-based single-cell RNA sequencing. We apply GoT to profile 38,290 CD34+ cells from patients with CALR-mutated myeloproliferative neoplasms to study how somatic mutations corrupt the complex process of human haematopoiesis. High-resolution mapping of malignant versus normal haematopoietic progenitors revealed an increasing fitness advantage with myeloid differentiation of cells with mutated CALR. We identified the unfolded protein response as a predominant outcome of CALR mutations, with a considerable dependency on cell identity, as well as upregulation of the NF-κB pathway specifically in uncommitted stem cells. We further extended the GoT toolkit to genotype multiple targets and loci that are distant from transcript ends. Together, these findings reveal that the transcriptional output of somatic mutations in myeloproliferative neoplasms is dependent on the native cell identity.


Assuntos
Genótipo , Mutação , Transtornos Mieloproliferativos/genética , Transtornos Mieloproliferativos/patologia , Neoplasias/genética , Neoplasias/patologia , Transcriptoma/genética , Animais , Antígenos CD34/metabolismo , Calreticulina/genética , Linhagem Celular , Proliferação de Células , Células Clonais/classificação , Células Clonais/metabolismo , Células Clonais/patologia , Endorribonucleases/metabolismo , Hematopoese/genética , Células-Tronco Hematopoéticas/classificação , Células-Tronco Hematopoéticas/metabolismo , Células-Tronco Hematopoéticas/patologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Camundongos , Modelos Moleculares , Transtornos Mieloproliferativos/classificação , NF-kappa B/metabolismo , Neoplasias/classificação , Células-Tronco Neoplásicas/citologia , Células-Tronco Neoplásicas/metabolismo , Células-Tronco Neoplásicas/patologia , Mielofibrose Primária/genética , Mielofibrose Primária/patologia , Proteínas Serina-Treonina Quinases/metabolismo , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Resposta a Proteínas não Dobradas/genética
3.
Proc Natl Acad Sci U S A ; 118(37)2021 09 14.
Artigo em Inglês | MEDLINE | ID: mdl-34497122

RESUMO

Some of the most spectacular adaptive radiations begin with founder populations on remote islands. How genetically limited founder populations give rise to the striking phenotypic and ecological diversity characteristic of adaptive radiations is a paradox of evolutionary biology. We conducted an evolutionary genomics analysis of genus Metrosideros, a landscape-dominant, incipient adaptive radiation of woody plants that spans a striking range of phenotypes and environments across the Hawaiian Islands. Using nanopore-sequencing, we created a chromosome-level genome assembly for Metrosideros polymorpha var. incana and analyzed whole-genome sequences of 131 individuals from 11 taxa sampled across the islands. Demographic modeling and population genomics analyses suggested that Hawaiian Metrosideros originated from a single colonization event and subsequently spread across the archipelago following the formation of new islands. The evolutionary history of Hawaiian Metrosideros shows evidence of extensive reticulation associated with significant sharing of ancestral variation between taxa and secondarily with admixture. Taking advantage of the highly contiguous genome assembly, we investigated the genomic architecture underlying the adaptive radiation and discovered that divergent selection drove the formation of differentiation outliers in paired taxa representing early stages of speciation/divergence. Analysis of the evolutionary origins of the outlier single nucleotide polymorphisms (SNPs) showed enrichment for ancestral variations under divergent selection. Our findings suggest that Hawaiian Metrosideros possesses an unexpectedly rich pool of ancestral genetic variation, and the reassortment of these variations has fueled the island adaptive radiation.


Assuntos
Adaptação Fisiológica , Evolução Molecular , Especiação Genética , Myrtaceae/fisiologia , Polimorfismo Genético , Tolerância a Radiação , Radiação Ionizante , Genética Populacional , Myrtaceae/efeitos da radiação , Fenótipo
4.
Genome Res ; 30(3): 437-446, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-32075851

RESUMO

Viruses are the most abundant biological entities on Earth and play key roles in host ecology, evolution, and horizontal gene transfer. Despite recent progress in viral metagenomics, the inherent genetic complexity of virus populations still poses technical difficulties for recovering complete virus genomes from natural assemblages. To address these challenges, we developed an assembly-free, single-molecule nanopore sequencing approach, enabling direct recovery of complete virus genome sequences from environmental samples. Our method yielded thousands of full-length, high-quality draft virus genome sequences that were not recovered using standard short-read assembly approaches. Additionally, our analyses discriminated between populations whose genomes had identical direct terminal repeats versus those with circularly permuted repeats at their termini, thus providing new insight into native virus reproduction and genome packaging. Novel DNA sequences were discovered, whose repeat structures, gene contents, and concatemer lengths suggest they are phage-inducible chromosomal islands, which are packaged as concatemers in phage particles, with lengths that match the size ranges of co-occurring phage genomes. Our new virus sequencing strategy can provide previously unavailable information about the genome structures, population biology, and ecology of naturally occurring viruses and viral parasites.


Assuntos
Genoma Viral , Sequenciamento por Nanoporos/métodos , Bacteriófagos/genética , Empacotamento do DNA , Metagenômica , Água do Mar/virologia
5.
Genome Res ; 22(6): 1107-19, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22434425

RESUMO

Segmented filamentous bacteria (SFB) are host-specific intestinal symbionts that comprise a distinct clade within the Clostridiaceae, designated Candidatus Arthromitus. SFB display a unique life cycle within the host, involving differentiation into multiple cell types. The latter include filaments that attach intimately to intestinal epithelial cells, and from which "holdfasts" and spores develop. SFB induce a multifaceted immune response, leading to host protection from intestinal pathogens. Cultivation resistance has hindered characterization of these enigmatic bacteria. In the present study, we isolated five SFB filaments from a mouse using a microfluidic device equipped with laser tweezers, generated genome sequences from each, and compared these sequences with each other, as well as to recently published SFB genome sequences. Based on the resulting analyses, SFB appear to be dependent on the host for a variety of essential nutrients. SFB have a relatively high abundance of predicted proteins devoted to cell cycle control and to envelope biogenesis, and have a group of SFB-specific autolysins and a dynamin-like protein. Among the five filament genomes, an average of 8.6% of predicted proteins were novel, including a family of secreted SFB-specific proteins. Four ADP-ribosyltransferase (ADPRT) sequence types, and a myosin-cross-reactive antigen (MCRA) protein were discovered; we hypothesize that they are involved in modulation of host responses. The presence of polymorphisms among mouse SFB genomes suggests the evolution of distinct SFB lineages. Overall, our results reveal several aspects of SFB adaptation to the mammalian intestinal tract.


Assuntos
Proteínas de Bactérias/genética , Genoma Bacteriano , Bactérias Gram-Positivas Formadoras de Endosporo/fisiologia , Intestinos/microbiologia , Análise de Célula Única/métodos , ADP Ribose Transferases/genética , ADP Ribose Transferases/metabolismo , Adaptação Fisiológica , Sequência de Aminoácidos , Animais , Proteínas de Bactérias/metabolismo , Diferenciação Celular/genética , DNA Ribossômico , Células Epiteliais/microbiologia , Bactérias Gram-Positivas Formadoras de Endosporo/genética , Camundongos , Técnicas Analíticas Microfluídicas , Dados de Sequência Molecular , Filogenia , Polimorfismo Genético , Análise de Sequência de DNA
6.
Cell Genom ; : 100590, 2024 Jun 19.
Artigo em Inglês | MEDLINE | ID: mdl-38908378

RESUMO

The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a complex genomic rearrangement (CGR). Although it has been identified as an important pathogenic DNA mutation signature in genomic disorders and cancer genomes, its architecture remains unresolved. Here, we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the DNA of 24 patients identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted structural variant (SV) haplotypes. Using a combination of short-read genome sequencing (GS), long-read GS, optical genome mapping, and single-cell DNA template strand sequencing (strand-seq), the haplotype structure was resolved in 18 samples. The point of template switching in 4 samples was shown to be a segment of ∼2.2-5.5 kb of 100% nucleotide similarity within inverted repeat pairs. These data provide experimental evidence that inverted low-copy repeats act as recombinant substrates. This type of CGR can result in multiple conformers generating diverse SV haplotypes in susceptible dosage-sensitive loci.

7.
bioRxiv ; 2023 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-37873367

RESUMO

Background: The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a type of complex genomic rearrangement (CGR) hypothesized to result from replicative repair of DNA due to replication fork collapse. It is often mediated by a pair of inverted low-copy repeats (LCR) followed by iterative template switches resulting in at least two breakpoint junctions in cis . Although it has been identified as an important mutation signature of pathogenicity for genomic disorders and cancer genomes, its architecture remains unresolved and is predicted to display at least four structural variation (SV) haplotypes. Results: Here we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the genomic DNA of 24 patients with neurodevelopmental disorders identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted SV haplotypes. Using a combination of short-read genome sequencing (GS), long- read GS, optical genome mapping and StrandSeq the haplotype structure was resolved in 18 samples. This approach refined the point of template switching between inverted LCRs in 4 samples revealing a DNA segment of ∼2.2-5.5 kb of 100% nucleotide similarity. A prediction model was developed to infer the LCR used to mediate the non-allelic homology repair. Conclusions: These data provide experimental evidence supporting the hypothesis that inverted LCRs act as a recombinant substrate in replication-based repair mechanisms. Such inverted repeats are particularly relevant for formation of copy-number associated inversions, including the DUP-TRP/INV-DUP structures. Moreover, this type of CGR can result in multiple conformers which contributes to generate diverse SV haplotypes in susceptible loci .

8.
Cell Stem Cell ; 30(9): 1262-1281.e8, 2023 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-37582363

RESUMO

RNA splicing factors are recurrently mutated in clonal blood disorders, but the impact of dysregulated splicing in hematopoiesis remains unclear. To overcome technical limitations, we integrated genotyping of transcriptomes (GoT) with long-read single-cell transcriptomics and proteogenomics for single-cell profiling of transcriptomes, surface proteins, somatic mutations, and RNA splicing (GoT-Splice). We applied GoT-Splice to hematopoietic progenitors from myelodysplastic syndrome (MDS) patients with mutations in the core splicing factor SF3B1. SF3B1mut cells were enriched in the megakaryocytic-erythroid lineage, with expansion of SF3B1mut erythroid progenitor cells. We uncovered distinct cryptic 3' splice site usage in different progenitor populations and stage-specific aberrant splicing during erythroid differentiation. Profiling SF3B1-mutated clonal hematopoiesis samples revealed that erythroid bias and cell-type-specific cryptic 3' splice site usage in SF3B1mut cells precede overt MDS. Collectively, GoT-Splice defines the cell-type-specific impact of somatic mutations on RNA splicing, from early clonal outgrowths to overt neoplasia, directly in human samples.


Assuntos
Síndromes Mielodisplásicas , Sítios de Splice de RNA , Humanos , Multiômica , Splicing de RNA/genética , Síndromes Mielodisplásicas/genética , Síndromes Mielodisplásicas/metabolismo , Fatores de Processamento de RNA/genética , Fatores de Processamento de RNA/metabolismo , Mutação/genética , Fosfoproteínas/genética , Fosfoproteínas/metabolismo
9.
Nucleic Acids Res ; 38(22): 7916-26, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-20702423

RESUMO

Cis-acting short sequence motifs play important roles in alternative splicing. It is now possible to identify such sequence motifs as conserved sequence patterns in genome sequence alignments. Here, we report the systematic search for motifs in the neighboring introns of alternatively spliced exons by using comparative analysis of mammalian genome alignments. We identified 11 conserved sequence motifs that might be involved in the regulation of alternative splicing. These motifs are not only significantly overrepresented near alternatively spliced exons, but they also co-occur with each other, thus, forming a network of cis-elements, likely to be the basis for context-dependent regulation. Based on this finding, we applied the motif co-occurrence to predict alternatively skipped exons. We verified exon skipping in 29 cases out of 118 predictions (25%) by EST and mRNA sequences in the databases. For the predictions not verified by the database sequences, we confirmed exon skipping in 10 additional cases by using both RT-PCR experiments and the publicly available RNA-Seq data. These results indicate that even more alternative splicing events will be found with the progress of large-scale and high-throughput analyses for various tissue samples and developmental stages.


Assuntos
Processamento Alternativo , Íntrons , Sequências Reguladoras de Ácido Ribonucleico , Animais , Sequência de Bases , Sequência Conservada , Éxons , Genômica , Humanos , Dados de Sequência Molecular , Alinhamento de Sequência
10.
Nat Biotechnol ; 40(10): 1488-1499, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35637420

RESUMO

High-order three-dimensional (3D) interactions between more than two genomic loci are common in human chromatin, but their role in gene regulation is unclear. Previous high-order 3D chromatin assays either measure distant interactions across the genome or proximal interactions at selected targets. To address this gap, we developed Pore-C, which combines chromatin conformation capture with nanopore sequencing of concatemers to profile proximal high-order chromatin contacts at the genome scale. We also developed the statistical method Chromunity to identify sets of genomic loci with frequencies of high-order contacts significantly higher than background ('synergies'). Applying these methods to human cell lines, we found that synergies were enriched in enhancers and promoters in active chromatin and in highly transcribed and lineage-defining genes. In prostate cancer cells, these included binding sites of androgen-driven transcription factors and the promoters of androgen-regulated genes. Concatemers of high-order contacts in highly expressed genes were demethylated relative to pairwise contacts at the same loci. Synergies in breast cancer cells were associated with tyfonas, a class of complex DNA amplicons. These results rigorously link genome-wide high-order 3D interactions to lineage-defining transcriptional programs and establish Pore-C and Chromunity as scalable approaches to assess high-order genome structure.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Androgênios , Cromatina/genética , Humanos , Fatores de Transcrição/genética
11.
Genome Med ; 14(1): 122, 2022 10 27.
Artigo em Inglês | MEDLINE | ID: mdl-36303224

RESUMO

BACKGROUND: The multiple de novo copy number variant (MdnCNV) phenotype is described by having four or more constitutional de novo CNVs (dnCNVs) arising independently throughout the human genome within one generation. It is a rare peri-zygotic mutational event, previously reported to be seen once in every 12,000 individuals referred for genome-wide chromosomal microarray analysis due to congenital abnormalities. These rare families provide a unique opportunity to understand the genetic factors of peri-zygotic genome instability and the impact of dnCNV on human diseases. METHODS: Chromosomal microarray analysis (CMA), array-based comparative genomic hybridization, short- and long-read genome sequencing (GS) were performed on the newly identified MdnCNV family to identify de novo mutations including dnCNVs, de novo single-nucleotide variants (dnSNVs), and indels. Short-read GS was performed on four previously published MdnCNV families for dnSNV analysis. Trio-based rare variant analysis was performed on the newly identified individual and four previously published MdnCNV families to identify potential genetic etiologies contributing to the peri-zygotic genomic instability. Lin semantic similarity scores informed quantitative human phenotype ontology analysis on three MdnCNV families to identify gene(s) driving or contributing to the clinical phenotype. RESULTS: In the newly identified MdnCNV case, we revealed eight de novo tandem duplications, each ~ 1 Mb, with microhomology at 6/8 breakpoint junctions. Enrichment of de novo single-nucleotide variants (SNV; 6/79) and de novo indels (1/12) was found within 4 Mb of the dnCNV genomic regions. An elevated post-zygotic SNV mutation rate was observed in MdnCNV families. Maternal rare variant analyses identified three genes in distinct families that may contribute to the MdnCNV phenomenon. Phenotype analysis suggests that gene(s) within dnCNV regions contribute to the observed proband phenotype in 3/3 cases. CNVs in two cases, a contiguous gene duplication encompassing PMP22 and RAI1 and another duplication affecting NSD1 and SMARCC2, contribute to the clinically observed phenotypic manifestations. CONCLUSIONS: Characteristic features of dnCNVs reported here are consistent with a microhomology-mediated break-induced replication (MMBIR)-driven mechanism during the peri-zygotic period. Maternal genetic variants in DNA repair genes potentially contribute to peri-zygotic genomic instability. Variable phenotypic features were observed across a cohort of three MdnCNV probands, and computational quantitative phenotyping revealed that two out of three had evidence for the contribution of more than one genetic locus to the proband's phenotype supporting the hypothesis of de novo multilocus pathogenic variation (MPV) in those families.


Assuntos
Variações do Número de Cópias de DNA , Instabilidade Genômica , Humanos , Hibridização Genômica Comparativa , Mutação , DNA , Nucleotídeos , Proteínas de Ligação a DNA/genética , Fatores de Transcrição/genética
12.
Bioinformatics ; 26(23): 2977-8, 2010 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-20959381

RESUMO

SUMMARY: SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It provides tools to estimate the quantitative phylogenetic and functional compositions of metagenomes, to compare compositions of multiple metagenomes and to produce intuitive visual representations of such analyses. AVAILABILITY: SmashCommunity source code and documentation are available at http://www.bork.embl.de/software/smash CONTACT: bork@embl.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Metagenômica/métodos , Software , Genes , Anotação de Sequência Molecular , Filogenia , Análise de Sequência de DNA
13.
Bioinformatics ; 26(23): 2979-80, 2010 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-20966005

RESUMO

SUMMARY: Recent advances in single-cell manipulation technology, whole genome amplification and high-throughput sequencing have now made it possible to sequence the genome of an individual cell. The bioinformatic analysis of these genomes, however, is far more complicated than the analysis of those generated using traditional, culture-based methods. In order to simplify this analysis, we have developed SmashCell (Simple Metagenomics Analysis SHell-for sequences from single Cells). It is designed to automate the main steps in microbial genome analysis-assembly, gene prediction, functional annotation-in a way that allows parameter and algorithm exploration at each step in the process. It also manages the data created by these analyses and provides visualization methods for rapid analysis of the results. AVAILABILITY: The SmashCell source code and a comprehensive manual are available at http://asiago.stanford.edu/SmashCell CONTACT: eoghanh@stanford.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica/métodos , Software , Algoritmos , Mapeamento Cromossômico/métodos , Genoma , Técnicas de Amplificação de Ácido Nucleico , Análise de Sequência de DNA/métodos , Análise de Célula Única
14.
Curr Opin Struct Biol ; 17(3): 362-9, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17574832

RESUMO

Given that the number of protein functions on earth is finite, the rapid expansion of biological knowledge and the concomitant exponential increase in the number of protein sequences should, at some point, enable the estimation of the limits of protein function space. The functional coverage of protein sequences can be investigated using computational methods, especially given the massive amount of data being generated by large-scale environmental sequencing (metagenomics). In completely sequenced genomes, the fraction of proteins to which at least some functional features can be assigned has recently risen to as much as approximately 85%. Although this fraction is more uncertain in metagenomics surveys, because of environmental complexities and differences in analysis protocols, our global knowledge of protein functions still appears to be considerable. However, when we consider protein families, continued sequencing seems to yield an ever-increasing number of novel families. Until we reconcile these two views, the limits of protein space will remain obscured.


Assuntos
Bioquímica/tendências , Proteínas/fisiologia , Animais , Escherichia coli/fisiologia , Humanos , Proteínas/genética , Análise de Sequência de Proteína
15.
Genomics ; 93(3): 213-20, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19059335

RESUMO

The Alternative Splicing and Transcript Diversity database (ASTD) gives access to a vast collection of alternative transcripts that integrate transcription initiation, polyadenylation and splicing variant data. Alternative transcripts are derived from the mapping of transcribed sequences to the complete human, mouse and rat genomes using an extension of the computational pipeline developed for the ASD (Alternative Splicing Database) and ATD (Alternative Transcript Diversity) databases, which are now superseded by ASTD. For the human genome, ASTD identifies splicing variants, transcription initiation variants and polyadenylation variants in 68%, 68% and 62% of the gene set, respectively, consistent with current estimates for transcription variation. Users can access ASTD through a variety of browsing and query tools, including expression state-based queries for the identification of tissue-specific isoforms. Participating laboratories have experimentally validated a subset of ASTD-predicted alternative splice forms and alternative polyadenylation forms that were not previously reported. The ASTD database can be accessed at http://www.ebi.ac.uk/astd.


Assuntos
Processamento Alternativo/genética , Bases de Dados Genéticas , Animais , Sistemas de Gerenciamento de Base de Dados , Humanos , Armazenamento e Recuperação da Informação/métodos , Camundongos , Ratos , Reprodutibilidade dos Testes , Software , Interface Usuário-Computador
16.
Genes (Basel) ; 11(1)2020 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-31936690

RESUMO

The MinION sequencer has made in situ sequencing feasible in remote locations. Following our initial demonstration of its high performance off planet with Earth-prepared samples, we developed and tested an end-to-end, sample-to-sequencer process that could be conducted entirely aboard the International Space Station (ISS). Initial experiments demonstrated the process with a microbial mock community standard. The DNA was successfully amplified, primers were degraded, and libraries prepared and sequenced. The median percent identities for both datasets were 84%, as assessed from alignment of the mock community. The ability to correctly identify the organisms in the mock community standard was comparable for the sequencing data obtained in flight and on the ground. To validate the process on microbes collected from and cultured aboard the ISS, bacterial cells were selected from a NASA Environmental Health Systems Surface Sample Kit contact slide. The locations of bacterial colonies chosen for identification were labeled, and a small number of cells were directly added as input into the sequencing workflow. Prepared DNA was sequenced, and the data were downlinked to Earth. Return of the contact slide to the ground allowed for standard laboratory processing for bacterial identification. The identifications obtained aboard the ISS, Staphylococcus hominis and Staphylococcus capitis, matched those determined on the ground down to the species level. This marks the first ever identification of microbes entirely off Earth, and this validated process could be used for in-flight microbial identification, diagnosis of infectious disease in a crewmember, and as a research platform for investigators around the world.


Assuntos
Sequenciamento por Nanoporos/métodos , RNA Ribossômico 16S/genética , Manejo de Espécimes/métodos , Bactérias/genética , DNA Bacteriano/genética , DNA Ribossômico/genética , Exobiologia/métodos , Meio Ambiente Extraterreno , Genoma Bacteriano/genética , Microbiota/genética , Nanoporos , Análise de Sequência de DNA/métodos , Astronave/instrumentação
17.
Genome Biol ; 21(1): 21, 2020 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-32019604

RESUMO

BACKGROUND: The circum-basmati group of cultivated Asian rice (Oryza sativa) contains many iconic varieties and is widespread in the Indian subcontinent. Despite its economic and cultural importance, a high-quality reference genome is currently lacking, and the group's evolutionary history is not fully resolved. To address these gaps, we use long-read nanopore sequencing and assemble the genomes of two circum-basmati rice varieties. RESULTS: We generate two high-quality, chromosome-level reference genomes that represent the 12 chromosomes of Oryza. The assemblies show a contig N50 of 6.32 Mb and 10.53 Mb for Basmati 334 and Dom Sufid, respectively. Using our highly contiguous assemblies, we characterize structural variations segregating across circum-basmati genomes. We discover repeat expansions not observed in japonica-the rice group most closely related to circum-basmati-as well as the presence and absence variants of over 20 Mb, one of which is a circum-basmati-specific deletion of a gene regulating awn length. We further detect strong evidence of admixture between the circum-basmati and circum-aus groups. This gene flow has its greatest effect on chromosome 10, causing both structural variation and single-nucleotide polymorphism to deviate from genome-wide history. Lastly, population genomic analysis of 78 circum-basmati varieties shows three major geographically structured genetic groups: Bhutan/Nepal, India/Bangladesh/Myanmar, and Iran/Pakistan. CONCLUSION: The availability of high-quality reference genomes allows functional and evolutionary genomic analyses providing genome-wide evidence for gene flow between circum-aus and circum-basmati, describes the nature of circum-basmati structural variation, and reveals the presence/absence variation in this important and iconic rice variety group.


Assuntos
Sequenciamento por Nanoporos/métodos , Oryza/genética , Sequenciamento Completo do Genoma/métodos , Cromossomos de Plantas/genética , Mapeamento de Sequências Contíguas/métodos , Evolução Molecular , Genoma de Planta , Oryza/classificação , Filogenia
18.
Bioinformatics ; 24(17): 1959-60, 2008 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-18635569

RESUMO

UNLABELLED: Sircah is a flexible tool for the detection, analysis and visualization of alternative transcripts. It takes as input gene models or spliced alignments and creates a database of alternative transcription events: alternative transcription initiation and polyadenylation, alternative 3' and 5' splice-site usage, skipped exons and retained introns. The results can be visualized in a variety of ways, allowing the creation of publication quality images. AVAILABILITY: The Sircah is available for download under a creative commons license along with additional documentation and a tutorial from http://www.bork.embl.de/Sircah.


Assuntos
Algoritmos , Gráficos por Computador , Sítios de Splice de RNA/genética , Análise de Sequência de DNA/métodos , Software , Fatores de Transcrição/genética , Interface Usuário-Computador , Sequência de Bases , Dados de Sequência Molecular
19.
Genome Med ; 11(1): 25, 2019 04 23.
Artigo em Inglês | MEDLINE | ID: mdl-31014393

RESUMO

BACKGROUND: Intrachromosomal triplications (TRP) can contribute to disease etiology via gene dosage effects, gene disruption, position effects, or fusion gene formation. Recently, post-zygotic de novo triplications adjacent to copy-number neutral genomic intervals with runs of homozygosity (ROH) have been shown to result in uniparental isodisomy (UPD). The genomic structure of these complex genomic rearrangements (CGRs) shows a consistent pattern of an inverted triplication flanked by duplications (DUP-TRP/INV-DUP) formed by an iterative DNA replisome template-switching mechanism during replicative repair of a single-ended, double-stranded DNA (seDNA), the ROH results from an interhomolog or nonsister chromatid template switch. It has been postulated that these CGRs may lead to genetic abnormalities in carriers due to dosage-sensitive genes mapping within the copy-number variant regions, homozygosity for alleles at a locus causing an autosomal recessive (AR) disease trait within the ROH region, or imprinting-associated diseases. METHODS: Here, we report a family wherein the affected subject carries a de novo 2.2-Mb TRP followed by 42.2 Mb of ROH and manifests clinical features overlapping with those observed in association with chromosome 14 maternal UPD (UPD(14)mat). UPD(14)mat can cause clinical phenotypic features enabling a diagnosis of Temple syndrome. This CGR was then molecularly characterized by high-density custom aCGH, genome-wide single-nucleotide polymorphism (SNP) and methylation arrays, exome sequencing (ES), and the Oxford Nanopore long-read sequencing technology. RESULTS: We confirmed the postulated DUP-TRP/INV-DUP structure by multiple orthogonal genomic technologies in the proband. The methylation status of known differentially methylated regions (DMRs) on chromosome 14 revealed that the subject shows the typical methylation pattern of UPD(14)mat. Consistent with these molecular findings, the clinical features overlap with those observed in Temple syndrome, including speech delay. CONCLUSIONS: These data provide experimental evidence that, in humans, triplication can lead to segmental UPD and imprinting disease. Importantly, genotype/phenotype analyses further reveal how a post-zygotically generated complex structural variant, resulting from a replication-based mutational mechanism, contributes to expanding the clinical phenotype of known genetic syndromes. Mechanistically, such events can distort transmission genetics resulting in homozygosity at a locus for which only one parent is a carrier as well as cause imprinting diseases.


Assuntos
Aberrações Cromossômicas , Transtornos Cromossômicos/genética , Cromossomos Humanos Par 14/genética , Impressão Genômica , Transtornos Cromossômicos/patologia , Metilação de DNA , Replicação do DNA , Humanos , Masculino , Linhagem , Fenótipo , Polimorfismo de Nucleotídeo Único , Adulto Jovem
20.
BMC Genomics ; 9: 335, 2008 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-18627618

RESUMO

BACKGROUND: Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure against long overlaps, as the existence of overlapping reading frames increases the risk of deleterious mutations. Here we examine the longest overlaps and assess whether they are the product of special functional constraints or of erroneous annotation. RESULTS: We analysed the genes that overlap by 60 bps or more among 338 fully-sequenced prokaryotic genomes. The likely functional significance of an overlap was determined by comparing each of the genes to its respective orthologs. If a gene showed a significantly different length from its orthologs it was considered unlikely to be functional and therefore the result of an error either in sequencing or gene prediction. Focusing on 715 co-directional overlaps longer than 60 bps, we classified the erroneous ones into five categories: i) 5'-end extension of the downstream gene due to either a mispredicted start codon or a frameshift at 5'-end of the gene (409 overlaps), ii) fragmentation of a gene caused by a frameshift (163), iii) 3'-end extension of the upstream gene due to either a frameshift at 3'-end of a gene or point mutation at the stop codon (68), iv) Redundant gene predictions (4), v) 5' & 3'-end extension which is a combination of i) and iii) (71). We also studied 75 divergent overlaps that could be classified as misannotations of group i). Nevertheless we found some convergent long overlaps (54) that might be true overlaps, although an important part of convergent overlaps could be classified as group iii) (124). CONCLUSION: Among the 968 overlaps larger than 60 bps which we analysed, we did not find a single real one among the co-directional and divergent orientations and concluded that there had been an excessive number of misannotations. Only convergent orientation seems to permit some long overlaps, although convergent overlaps are also hampered by misannotations. We propose a simple rule to flag these erroneous gene length predictions to facilitate automatic annotation.


Assuntos
Homologia de Genes/genética , Genoma , Células Procarióticas/metabolismo , Sequência de Aminoácidos , Sequência de Bases , Códon de Iniciação , Códon de Terminação , Bases de Dados Factuais , Evolução Molecular , Mutação da Fase de Leitura , Dados de Sequência Molecular , Fases de Leitura Aberta , Homologia de Sequência de Aminoácidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA