Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Nature ; 608(7922): 353-359, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35922509

RESUMEN

Regulation of transcript structure generates transcript diversity and plays an important role in human disease1-7. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure8-16. In this Article, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from Genotype-Tissue Expression (GTEx) tissues and cell lines, complementing the GTEx resource. We identified just over 70,000 novel transcripts for annotated genes, and validated the protein expression of 10% of novel transcripts. We developed a new computational package, LORALS, to analyse the genetic effects of rare and common variants on the transcriptome by allele-specific analysis of long reads. We characterized allele-specific expression and transcript structure events, providing new insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb the transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we used this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.


Asunto(s)
Alelos , Perfilación de la Expresión Génica , Especificidad de Órganos , RNA-Seq , Transcriptoma , Empalme Alternativo/genética , Línea Celular , Conjuntos de Datos como Asunto , Genotipo , Ribonucleoproteínas Nucleares Heterogéneas/deficiencia , Ribonucleoproteínas Nucleares Heterogéneas/genética , Humanos , Especificidad de Órganos/genética , Proteína de Unión al Tracto de Polipirimidina/deficiencia , Proteína de Unión al Tracto de Polipirimidina/genética , Reproducibilidad de los Resultados , Transcriptoma/genética
2.
Nature ; 571(7765): 355-360, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-31270458

RESUMEN

Defining the transcriptomic identity of malignant cells is challenging in the absence of surface markers that distinguish cancer clones from one another, or from admixed non-neoplastic cells. To address this challenge, here we developed Genotyping of Transcriptomes (GoT), a method to integrate genotyping with high-throughput droplet-based single-cell RNA sequencing. We apply GoT to profile 38,290 CD34+ cells from patients with CALR-mutated myeloproliferative neoplasms to study how somatic mutations corrupt the complex process of human haematopoiesis. High-resolution mapping of malignant versus normal haematopoietic progenitors revealed an increasing fitness advantage with myeloid differentiation of cells with mutated CALR. We identified the unfolded protein response as a predominant outcome of CALR mutations, with a considerable dependency on cell identity, as well as upregulation of the NF-κB pathway specifically in uncommitted stem cells. We further extended the GoT toolkit to genotype multiple targets and loci that are distant from transcript ends. Together, these findings reveal that the transcriptional output of somatic mutations in myeloproliferative neoplasms is dependent on the native cell identity.


Asunto(s)
Genotipo , Mutación , Trastornos Mieloproliferativos/genética , Trastornos Mieloproliferativos/patología , Neoplasias/genética , Neoplasias/patología , Transcriptoma/genética , Animales , Antígenos CD34/metabolismo , Calreticulina/genética , Línea Celular , Proliferación Celular , Células Clonales/clasificación , Células Clonales/metabolismo , Células Clonales/patología , Endorribonucleasas/metabolismo , Hematopoyesis/genética , Células Madre Hematopoyéticas/clasificación , Células Madre Hematopoyéticas/metabolismo , Células Madre Hematopoyéticas/patología , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Ratones , Modelos Moleculares , Trastornos Mieloproliferativos/clasificación , FN-kappa B/metabolismo , Neoplasias/clasificación , Células Madre Neoplásicas/citología , Células Madre Neoplásicas/metabolismo , Células Madre Neoplásicas/patología , Mielofibrosis Primaria/genética , Mielofibrosis Primaria/patología , Proteínas Serina-Treonina Quinasas/metabolismo , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Respuesta de Proteína Desplegada/genética
3.
Proc Natl Acad Sci U S A ; 118(37)2021 09 14.
Artículo en Inglés | MEDLINE | ID: mdl-34497122

RESUMEN

Some of the most spectacular adaptive radiations begin with founder populations on remote islands. How genetically limited founder populations give rise to the striking phenotypic and ecological diversity characteristic of adaptive radiations is a paradox of evolutionary biology. We conducted an evolutionary genomics analysis of genus Metrosideros, a landscape-dominant, incipient adaptive radiation of woody plants that spans a striking range of phenotypes and environments across the Hawaiian Islands. Using nanopore-sequencing, we created a chromosome-level genome assembly for Metrosideros polymorpha var. incana and analyzed whole-genome sequences of 131 individuals from 11 taxa sampled across the islands. Demographic modeling and population genomics analyses suggested that Hawaiian Metrosideros originated from a single colonization event and subsequently spread across the archipelago following the formation of new islands. The evolutionary history of Hawaiian Metrosideros shows evidence of extensive reticulation associated with significant sharing of ancestral variation between taxa and secondarily with admixture. Taking advantage of the highly contiguous genome assembly, we investigated the genomic architecture underlying the adaptive radiation and discovered that divergent selection drove the formation of differentiation outliers in paired taxa representing early stages of speciation/divergence. Analysis of the evolutionary origins of the outlier single nucleotide polymorphisms (SNPs) showed enrichment for ancestral variations under divergent selection. Our findings suggest that Hawaiian Metrosideros possesses an unexpectedly rich pool of ancestral genetic variation, and the reassortment of these variations has fueled the island adaptive radiation.


Asunto(s)
Adaptación Fisiológica , Evolución Molecular , Especiación Genética , Myrtaceae/fisiología , Polimorfismo Genético , Tolerancia a Radiación , Radiación Ionizante , Genética de Población , Myrtaceae/efectos de la radiación , Fenotipo
4.
Genome Res ; 30(3): 437-446, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-32075851

RESUMEN

Viruses are the most abundant biological entities on Earth and play key roles in host ecology, evolution, and horizontal gene transfer. Despite recent progress in viral metagenomics, the inherent genetic complexity of virus populations still poses technical difficulties for recovering complete virus genomes from natural assemblages. To address these challenges, we developed an assembly-free, single-molecule nanopore sequencing approach, enabling direct recovery of complete virus genome sequences from environmental samples. Our method yielded thousands of full-length, high-quality draft virus genome sequences that were not recovered using standard short-read assembly approaches. Additionally, our analyses discriminated between populations whose genomes had identical direct terminal repeats versus those with circularly permuted repeats at their termini, thus providing new insight into native virus reproduction and genome packaging. Novel DNA sequences were discovered, whose repeat structures, gene contents, and concatemer lengths suggest they are phage-inducible chromosomal islands, which are packaged as concatemers in phage particles, with lengths that match the size ranges of co-occurring phage genomes. Our new virus sequencing strategy can provide previously unavailable information about the genome structures, population biology, and ecology of naturally occurring viruses and viral parasites.


Asunto(s)
Genoma Viral , Secuenciación de Nanoporos/métodos , Bacteriófagos/genética , Empaquetamiento del ADN , Metagenómica , Agua de Mar/virología
5.
Genome Res ; 22(6): 1107-19, 2012 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-22434425

RESUMEN

Segmented filamentous bacteria (SFB) are host-specific intestinal symbionts that comprise a distinct clade within the Clostridiaceae, designated Candidatus Arthromitus. SFB display a unique life cycle within the host, involving differentiation into multiple cell types. The latter include filaments that attach intimately to intestinal epithelial cells, and from which "holdfasts" and spores develop. SFB induce a multifaceted immune response, leading to host protection from intestinal pathogens. Cultivation resistance has hindered characterization of these enigmatic bacteria. In the present study, we isolated five SFB filaments from a mouse using a microfluidic device equipped with laser tweezers, generated genome sequences from each, and compared these sequences with each other, as well as to recently published SFB genome sequences. Based on the resulting analyses, SFB appear to be dependent on the host for a variety of essential nutrients. SFB have a relatively high abundance of predicted proteins devoted to cell cycle control and to envelope biogenesis, and have a group of SFB-specific autolysins and a dynamin-like protein. Among the five filament genomes, an average of 8.6% of predicted proteins were novel, including a family of secreted SFB-specific proteins. Four ADP-ribosyltransferase (ADPRT) sequence types, and a myosin-cross-reactive antigen (MCRA) protein were discovered; we hypothesize that they are involved in modulation of host responses. The presence of polymorphisms among mouse SFB genomes suggests the evolution of distinct SFB lineages. Overall, our results reveal several aspects of SFB adaptation to the mammalian intestinal tract.


Asunto(s)
Proteínas Bacterianas/genética , Genoma Bacteriano , Bacterias Grampositivas Formadoras de Endosporas/fisiología , Intestinos/microbiología , Análisis de la Célula Individual/métodos , ADP Ribosa Transferasas/genética , ADP Ribosa Transferasas/metabolismo , Adaptación Fisiológica , Secuencia de Aminoácidos , Animales , Proteínas Bacterianas/metabolismo , Diferenciación Celular/genética , ADN Ribosómico , Células Epiteliales/microbiología , Bacterias Grampositivas Formadoras de Endosporas/genética , Ratones , Técnicas Analíticas Microfluídicas , Datos de Secuencia Molecular , Filogenia , Polimorfismo Genético , Análisis de Secuencia de ADN
6.
Cell Genom ; : 100590, 2024 Jun 19.
Artículo en Inglés | MEDLINE | ID: mdl-38908378

RESUMEN

The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a complex genomic rearrangement (CGR). Although it has been identified as an important pathogenic DNA mutation signature in genomic disorders and cancer genomes, its architecture remains unresolved. Here, we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the DNA of 24 patients identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted structural variant (SV) haplotypes. Using a combination of short-read genome sequencing (GS), long-read GS, optical genome mapping, and single-cell DNA template strand sequencing (strand-seq), the haplotype structure was resolved in 18 samples. The point of template switching in 4 samples was shown to be a segment of ∼2.2-5.5 kb of 100% nucleotide similarity within inverted repeat pairs. These data provide experimental evidence that inverted low-copy repeats act as recombinant substrates. This type of CGR can result in multiple conformers generating diverse SV haplotypes in susceptible dosage-sensitive loci.

7.
bioRxiv ; 2023 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-37873367

RESUMEN

Background: The duplication-triplication/inverted-duplication (DUP-TRP/INV-DUP) structure is a type of complex genomic rearrangement (CGR) hypothesized to result from replicative repair of DNA due to replication fork collapse. It is often mediated by a pair of inverted low-copy repeats (LCR) followed by iterative template switches resulting in at least two breakpoint junctions in cis . Although it has been identified as an important mutation signature of pathogenicity for genomic disorders and cancer genomes, its architecture remains unresolved and is predicted to display at least four structural variation (SV) haplotypes. Results: Here we studied the genomic architecture of DUP-TRP/INV-DUP by investigating the genomic DNA of 24 patients with neurodevelopmental disorders identified by array comparative genomic hybridization (aCGH) on whom we found evidence for the existence of 4 out of 4 predicted SV haplotypes. Using a combination of short-read genome sequencing (GS), long- read GS, optical genome mapping and StrandSeq the haplotype structure was resolved in 18 samples. This approach refined the point of template switching between inverted LCRs in 4 samples revealing a DNA segment of ∼2.2-5.5 kb of 100% nucleotide similarity. A prediction model was developed to infer the LCR used to mediate the non-allelic homology repair. Conclusions: These data provide experimental evidence supporting the hypothesis that inverted LCRs act as a recombinant substrate in replication-based repair mechanisms. Such inverted repeats are particularly relevant for formation of copy-number associated inversions, including the DUP-TRP/INV-DUP structures. Moreover, this type of CGR can result in multiple conformers which contributes to generate diverse SV haplotypes in susceptible loci .

8.
Cell Stem Cell ; 30(9): 1262-1281.e8, 2023 09 07.
Artículo en Inglés | MEDLINE | ID: mdl-37582363

RESUMEN

RNA splicing factors are recurrently mutated in clonal blood disorders, but the impact of dysregulated splicing in hematopoiesis remains unclear. To overcome technical limitations, we integrated genotyping of transcriptomes (GoT) with long-read single-cell transcriptomics and proteogenomics for single-cell profiling of transcriptomes, surface proteins, somatic mutations, and RNA splicing (GoT-Splice). We applied GoT-Splice to hematopoietic progenitors from myelodysplastic syndrome (MDS) patients with mutations in the core splicing factor SF3B1. SF3B1mut cells were enriched in the megakaryocytic-erythroid lineage, with expansion of SF3B1mut erythroid progenitor cells. We uncovered distinct cryptic 3' splice site usage in different progenitor populations and stage-specific aberrant splicing during erythroid differentiation. Profiling SF3B1-mutated clonal hematopoiesis samples revealed that erythroid bias and cell-type-specific cryptic 3' splice site usage in SF3B1mut cells precede overt MDS. Collectively, GoT-Splice defines the cell-type-specific impact of somatic mutations on RNA splicing, from early clonal outgrowths to overt neoplasia, directly in human samples.


Asunto(s)
Síndromes Mielodisplásicos , Sitios de Empalme de ARN , Humanos , Multiómica , Empalme del ARN/genética , Síndromes Mielodisplásicos/genética , Síndromes Mielodisplásicos/metabolismo , Factores de Empalme de ARN/genética , Factores de Empalme de ARN/metabolismo , Mutación/genética , Fosfoproteínas/genética , Fosfoproteínas/metabolismo
9.
Nucleic Acids Res ; 38(22): 7916-26, 2010 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-20702423

RESUMEN

Cis-acting short sequence motifs play important roles in alternative splicing. It is now possible to identify such sequence motifs as conserved sequence patterns in genome sequence alignments. Here, we report the systematic search for motifs in the neighboring introns of alternatively spliced exons by using comparative analysis of mammalian genome alignments. We identified 11 conserved sequence motifs that might be involved in the regulation of alternative splicing. These motifs are not only significantly overrepresented near alternatively spliced exons, but they also co-occur with each other, thus, forming a network of cis-elements, likely to be the basis for context-dependent regulation. Based on this finding, we applied the motif co-occurrence to predict alternatively skipped exons. We verified exon skipping in 29 cases out of 118 predictions (25%) by EST and mRNA sequences in the databases. For the predictions not verified by the database sequences, we confirmed exon skipping in 10 additional cases by using both RT-PCR experiments and the publicly available RNA-Seq data. These results indicate that even more alternative splicing events will be found with the progress of large-scale and high-throughput analyses for various tissue samples and developmental stages.


Asunto(s)
Empalme Alternativo , Intrones , Secuencias Reguladoras de Ácido Ribonucleico , Animales , Secuencia de Bases , Secuencia Conservada , Exones , Genómica , Humanos , Datos de Secuencia Molecular , Alineación de Secuencia
10.
Nat Biotechnol ; 40(10): 1488-1499, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-35637420

RESUMEN

High-order three-dimensional (3D) interactions between more than two genomic loci are common in human chromatin, but their role in gene regulation is unclear. Previous high-order 3D chromatin assays either measure distant interactions across the genome or proximal interactions at selected targets. To address this gap, we developed Pore-C, which combines chromatin conformation capture with nanopore sequencing of concatemers to profile proximal high-order chromatin contacts at the genome scale. We also developed the statistical method Chromunity to identify sets of genomic loci with frequencies of high-order contacts significantly higher than background ('synergies'). Applying these methods to human cell lines, we found that synergies were enriched in enhancers and promoters in active chromatin and in highly transcribed and lineage-defining genes. In prostate cancer cells, these included binding sites of androgen-driven transcription factors and the promoters of androgen-regulated genes. Concatemers of high-order contacts in highly expressed genes were demethylated relative to pairwise contacts at the same loci. Synergies in breast cancer cells were associated with tyfonas, a class of complex DNA amplicons. These results rigorously link genome-wide high-order 3D interactions to lineage-defining transcriptional programs and establish Pore-C and Chromunity as scalable approaches to assess high-order genome structure.


Asunto(s)
Secuenciación de Nanoporos , Nanoporos , Andrógenos , Cromatina/genética , Humanos , Factores de Transcripción/genética
11.
Genome Med ; 14(1): 122, 2022 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-36303224

RESUMEN

BACKGROUND: The multiple de novo copy number variant (MdnCNV) phenotype is described by having four or more constitutional de novo CNVs (dnCNVs) arising independently throughout the human genome within one generation. It is a rare peri-zygotic mutational event, previously reported to be seen once in every 12,000 individuals referred for genome-wide chromosomal microarray analysis due to congenital abnormalities. These rare families provide a unique opportunity to understand the genetic factors of peri-zygotic genome instability and the impact of dnCNV on human diseases. METHODS: Chromosomal microarray analysis (CMA), array-based comparative genomic hybridization, short- and long-read genome sequencing (GS) were performed on the newly identified MdnCNV family to identify de novo mutations including dnCNVs, de novo single-nucleotide variants (dnSNVs), and indels. Short-read GS was performed on four previously published MdnCNV families for dnSNV analysis. Trio-based rare variant analysis was performed on the newly identified individual and four previously published MdnCNV families to identify potential genetic etiologies contributing to the peri-zygotic genomic instability. Lin semantic similarity scores informed quantitative human phenotype ontology analysis on three MdnCNV families to identify gene(s) driving or contributing to the clinical phenotype. RESULTS: In the newly identified MdnCNV case, we revealed eight de novo tandem duplications, each ~ 1 Mb, with microhomology at 6/8 breakpoint junctions. Enrichment of de novo single-nucleotide variants (SNV; 6/79) and de novo indels (1/12) was found within 4 Mb of the dnCNV genomic regions. An elevated post-zygotic SNV mutation rate was observed in MdnCNV families. Maternal rare variant analyses identified three genes in distinct families that may contribute to the MdnCNV phenomenon. Phenotype analysis suggests that gene(s) within dnCNV regions contribute to the observed proband phenotype in 3/3 cases. CNVs in two cases, a contiguous gene duplication encompassing PMP22 and RAI1 and another duplication affecting NSD1 and SMARCC2, contribute to the clinically observed phenotypic manifestations. CONCLUSIONS: Characteristic features of dnCNVs reported here are consistent with a microhomology-mediated break-induced replication (MMBIR)-driven mechanism during the peri-zygotic period. Maternal genetic variants in DNA repair genes potentially contribute to peri-zygotic genomic instability. Variable phenotypic features were observed across a cohort of three MdnCNV probands, and computational quantitative phenotyping revealed that two out of three had evidence for the contribution of more than one genetic locus to the proband's phenotype supporting the hypothesis of de novo multilocus pathogenic variation (MPV) in those families.


Asunto(s)
Variaciones en el Número de Copia de ADN , Inestabilidad Genómica , Humanos , Hibridación Genómica Comparativa , Mutación , ADN , Nucleótidos , Proteínas de Unión al ADN/genética , Factores de Transcripción/genética
12.
Bioinformatics ; 26(23): 2977-8, 2010 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-20959381

RESUMEN

SUMMARY: SmashCommunity is a stand-alone metagenomic annotation and analysis pipeline suitable for data from Sanger and 454 sequencing technologies. It supports state-of-the-art software for essential metagenomic tasks such as assembly and gene prediction. It provides tools to estimate the quantitative phylogenetic and functional compositions of metagenomes, to compare compositions of multiple metagenomes and to produce intuitive visual representations of such analyses. AVAILABILITY: SmashCommunity source code and documentation are available at http://www.bork.embl.de/software/smash CONTACT: bork@embl.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Metagenómica/métodos , Programas Informáticos , Genes , Anotación de Secuencia Molecular , Filogenia , Análisis de Secuencia de ADN
13.
Bioinformatics ; 26(23): 2979-80, 2010 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-20966005

RESUMEN

SUMMARY: Recent advances in single-cell manipulation technology, whole genome amplification and high-throughput sequencing have now made it possible to sequence the genome of an individual cell. The bioinformatic analysis of these genomes, however, is far more complicated than the analysis of those generated using traditional, culture-based methods. In order to simplify this analysis, we have developed SmashCell (Simple Metagenomics Analysis SHell-for sequences from single Cells). It is designed to automate the main steps in microbial genome analysis-assembly, gene prediction, functional annotation-in a way that allows parameter and algorithm exploration at each step in the process. It also manages the data created by these analyses and provides visualization methods for rapid analysis of the results. AVAILABILITY: The SmashCell source code and a comprehensive manual are available at http://asiago.stanford.edu/SmashCell CONTACT: eoghanh@stanford.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica/métodos , Programas Informáticos , Algoritmos , Mapeo Cromosómico/métodos , Genoma , Técnicas de Amplificación de Ácido Nucleico , Análisis de Secuencia de ADN/métodos , Análisis de la Célula Individual
14.
Curr Opin Struct Biol ; 17(3): 362-9, 2007 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-17574832

RESUMEN

Given that the number of protein functions on earth is finite, the rapid expansion of biological knowledge and the concomitant exponential increase in the number of protein sequences should, at some point, enable the estimation of the limits of protein function space. The functional coverage of protein sequences can be investigated using computational methods, especially given the massive amount of data being generated by large-scale environmental sequencing (metagenomics). In completely sequenced genomes, the fraction of proteins to which at least some functional features can be assigned has recently risen to as much as approximately 85%. Although this fraction is more uncertain in metagenomics surveys, because of environmental complexities and differences in analysis protocols, our global knowledge of protein functions still appears to be considerable. However, when we consider protein families, continued sequencing seems to yield an ever-increasing number of novel families. Until we reconcile these two views, the limits of protein space will remain obscured.


Asunto(s)
Bioquímica/tendencias , Proteínas/fisiología , Animales , Escherichia coli/fisiología , Humanos , Proteínas/genética , Análisis de Secuencia de Proteína
15.
Genomics ; 93(3): 213-20, 2009 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-19059335

RESUMEN

The Alternative Splicing and Transcript Diversity database (ASTD) gives access to a vast collection of alternative transcripts that integrate transcription initiation, polyadenylation and splicing variant data. Alternative transcripts are derived from the mapping of transcribed sequences to the complete human, mouse and rat genomes using an extension of the computational pipeline developed for the ASD (Alternative Splicing Database) and ATD (Alternative Transcript Diversity) databases, which are now superseded by ASTD. For the human genome, ASTD identifies splicing variants, transcription initiation variants and polyadenylation variants in 68%, 68% and 62% of the gene set, respectively, consistent with current estimates for transcription variation. Users can access ASTD through a variety of browsing and query tools, including expression state-based queries for the identification of tissue-specific isoforms. Participating laboratories have experimentally validated a subset of ASTD-predicted alternative splice forms and alternative polyadenylation forms that were not previously reported. The ASTD database can be accessed at http://www.ebi.ac.uk/astd.


Asunto(s)
Empalme Alternativo/genética , Bases de Datos Genéticas , Animales , Sistemas de Administración de Bases de Datos , Humanos , Almacenamiento y Recuperación de la Información/métodos , Ratones , Ratas , Reproducibilidad de los Resultados , Programas Informáticos , Interfaz Usuario-Computador
16.
Genes (Basel) ; 11(1)2020 01 09.
Artículo en Inglés | MEDLINE | ID: mdl-31936690

RESUMEN

The MinION sequencer has made in situ sequencing feasible in remote locations. Following our initial demonstration of its high performance off planet with Earth-prepared samples, we developed and tested an end-to-end, sample-to-sequencer process that could be conducted entirely aboard the International Space Station (ISS). Initial experiments demonstrated the process with a microbial mock community standard. The DNA was successfully amplified, primers were degraded, and libraries prepared and sequenced. The median percent identities for both datasets were 84%, as assessed from alignment of the mock community. The ability to correctly identify the organisms in the mock community standard was comparable for the sequencing data obtained in flight and on the ground. To validate the process on microbes collected from and cultured aboard the ISS, bacterial cells were selected from a NASA Environmental Health Systems Surface Sample Kit contact slide. The locations of bacterial colonies chosen for identification were labeled, and a small number of cells were directly added as input into the sequencing workflow. Prepared DNA was sequenced, and the data were downlinked to Earth. Return of the contact slide to the ground allowed for standard laboratory processing for bacterial identification. The identifications obtained aboard the ISS, Staphylococcus hominis and Staphylococcus capitis, matched those determined on the ground down to the species level. This marks the first ever identification of microbes entirely off Earth, and this validated process could be used for in-flight microbial identification, diagnosis of infectious disease in a crewmember, and as a research platform for investigators around the world.


Asunto(s)
Secuenciación de Nanoporos/métodos , ARN Ribosómico 16S/genética , Manejo de Especímenes/métodos , Bacterias/genética , ADN Bacteriano/genética , ADN Ribosómico/genética , Exobiología/métodos , Medio Ambiente Extraterrestre , Genoma Bacteriano/genética , Microbiota/genética , Nanoporos , Análisis de Secuencia de ADN/métodos , Nave Espacial/instrumentación
17.
Genome Biol ; 21(1): 21, 2020 02 05.
Artículo en Inglés | MEDLINE | ID: mdl-32019604

RESUMEN

BACKGROUND: The circum-basmati group of cultivated Asian rice (Oryza sativa) contains many iconic varieties and is widespread in the Indian subcontinent. Despite its economic and cultural importance, a high-quality reference genome is currently lacking, and the group's evolutionary history is not fully resolved. To address these gaps, we use long-read nanopore sequencing and assemble the genomes of two circum-basmati rice varieties. RESULTS: We generate two high-quality, chromosome-level reference genomes that represent the 12 chromosomes of Oryza. The assemblies show a contig N50 of 6.32 Mb and 10.53 Mb for Basmati 334 and Dom Sufid, respectively. Using our highly contiguous assemblies, we characterize structural variations segregating across circum-basmati genomes. We discover repeat expansions not observed in japonica-the rice group most closely related to circum-basmati-as well as the presence and absence variants of over 20 Mb, one of which is a circum-basmati-specific deletion of a gene regulating awn length. We further detect strong evidence of admixture between the circum-basmati and circum-aus groups. This gene flow has its greatest effect on chromosome 10, causing both structural variation and single-nucleotide polymorphism to deviate from genome-wide history. Lastly, population genomic analysis of 78 circum-basmati varieties shows three major geographically structured genetic groups: Bhutan/Nepal, India/Bangladesh/Myanmar, and Iran/Pakistan. CONCLUSION: The availability of high-quality reference genomes allows functional and evolutionary genomic analyses providing genome-wide evidence for gene flow between circum-aus and circum-basmati, describes the nature of circum-basmati structural variation, and reveals the presence/absence variation in this important and iconic rice variety group.


Asunto(s)
Secuenciación de Nanoporos/métodos , Oryza/genética , Secuenciación Completa del Genoma/métodos , Cromosomas de las Plantas/genética , Mapeo Contig/métodos , Evolución Molecular , Genoma de Planta , Oryza/clasificación , Filogenia
18.
Bioinformatics ; 24(17): 1959-60, 2008 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-18635569

RESUMEN

UNLABELLED: Sircah is a flexible tool for the detection, analysis and visualization of alternative transcripts. It takes as input gene models or spliced alignments and creates a database of alternative transcription events: alternative transcription initiation and polyadenylation, alternative 3' and 5' splice-site usage, skipped exons and retained introns. The results can be visualized in a variety of ways, allowing the creation of publication quality images. AVAILABILITY: The Sircah is available for download under a creative commons license along with additional documentation and a tutorial from http://www.bork.embl.de/Sircah.


Asunto(s)
Algoritmos , Gráficos por Computador , Sitios de Empalme de ARN/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Factores de Transcripción/genética , Interfaz Usuario-Computador , Secuencia de Bases , Datos de Secuencia Molecular
19.
Genome Med ; 11(1): 25, 2019 04 23.
Artículo en Inglés | MEDLINE | ID: mdl-31014393

RESUMEN

BACKGROUND: Intrachromosomal triplications (TRP) can contribute to disease etiology via gene dosage effects, gene disruption, position effects, or fusion gene formation. Recently, post-zygotic de novo triplications adjacent to copy-number neutral genomic intervals with runs of homozygosity (ROH) have been shown to result in uniparental isodisomy (UPD). The genomic structure of these complex genomic rearrangements (CGRs) shows a consistent pattern of an inverted triplication flanked by duplications (DUP-TRP/INV-DUP) formed by an iterative DNA replisome template-switching mechanism during replicative repair of a single-ended, double-stranded DNA (seDNA), the ROH results from an interhomolog or nonsister chromatid template switch. It has been postulated that these CGRs may lead to genetic abnormalities in carriers due to dosage-sensitive genes mapping within the copy-number variant regions, homozygosity for alleles at a locus causing an autosomal recessive (AR) disease trait within the ROH region, or imprinting-associated diseases. METHODS: Here, we report a family wherein the affected subject carries a de novo 2.2-Mb TRP followed by 42.2 Mb of ROH and manifests clinical features overlapping with those observed in association with chromosome 14 maternal UPD (UPD(14)mat). UPD(14)mat can cause clinical phenotypic features enabling a diagnosis of Temple syndrome. This CGR was then molecularly characterized by high-density custom aCGH, genome-wide single-nucleotide polymorphism (SNP) and methylation arrays, exome sequencing (ES), and the Oxford Nanopore long-read sequencing technology. RESULTS: We confirmed the postulated DUP-TRP/INV-DUP structure by multiple orthogonal genomic technologies in the proband. The methylation status of known differentially methylated regions (DMRs) on chromosome 14 revealed that the subject shows the typical methylation pattern of UPD(14)mat. Consistent with these molecular findings, the clinical features overlap with those observed in Temple syndrome, including speech delay. CONCLUSIONS: These data provide experimental evidence that, in humans, triplication can lead to segmental UPD and imprinting disease. Importantly, genotype/phenotype analyses further reveal how a post-zygotically generated complex structural variant, resulting from a replication-based mutational mechanism, contributes to expanding the clinical phenotype of known genetic syndromes. Mechanistically, such events can distort transmission genetics resulting in homozygosity at a locus for which only one parent is a carrier as well as cause imprinting diseases.


Asunto(s)
Aberraciones Cromosómicas , Trastornos de los Cromosomas/genética , Cromosomas Humanos Par 14/genética , Impresión Genómica , Trastornos de los Cromosomas/patología , Metilación de ADN , Replicación del ADN , Humanos , Masculino , Linaje , Fenotipo , Polimorfismo de Nucleótido Simple , Adulto Joven
20.
BMC Genomics ; 9: 335, 2008 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-18627618

RESUMEN

BACKGROUND: Across the fully sequenced microbial genomes there are thousands of examples of overlapping genes. Many of these are only a few nucleotides long and are thought to function by permitting the coordinated regulation of gene expression. However, there should also be selective pressure against long overlaps, as the existence of overlapping reading frames increases the risk of deleterious mutations. Here we examine the longest overlaps and assess whether they are the product of special functional constraints or of erroneous annotation. RESULTS: We analysed the genes that overlap by 60 bps or more among 338 fully-sequenced prokaryotic genomes. The likely functional significance of an overlap was determined by comparing each of the genes to its respective orthologs. If a gene showed a significantly different length from its orthologs it was considered unlikely to be functional and therefore the result of an error either in sequencing or gene prediction. Focusing on 715 co-directional overlaps longer than 60 bps, we classified the erroneous ones into five categories: i) 5'-end extension of the downstream gene due to either a mispredicted start codon or a frameshift at 5'-end of the gene (409 overlaps), ii) fragmentation of a gene caused by a frameshift (163), iii) 3'-end extension of the upstream gene due to either a frameshift at 3'-end of a gene or point mutation at the stop codon (68), iv) Redundant gene predictions (4), v) 5' & 3'-end extension which is a combination of i) and iii) (71). We also studied 75 divergent overlaps that could be classified as misannotations of group i). Nevertheless we found some convergent long overlaps (54) that might be true overlaps, although an important part of convergent overlaps could be classified as group iii) (124). CONCLUSION: Among the 968 overlaps larger than 60 bps which we analysed, we did not find a single real one among the co-directional and divergent orientations and concluded that there had been an excessive number of misannotations. Only convergent orientation seems to permit some long overlaps, although convergent overlaps are also hampered by misannotations. We propose a simple rule to flag these erroneous gene length predictions to facilitate automatic annotation.


Asunto(s)
Genes Sobrepuestos/genética , Genoma , Células Procariotas/metabolismo , Secuencia de Aminoácidos , Secuencia de Bases , Codón Iniciador , Codón de Terminación , Bases de Datos Factuales , Evolución Molecular , Mutación del Sistema de Lectura , Datos de Secuencia Molecular , Sistemas de Lectura Abierta , Homología de Secuencia de Aminoácido
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA