Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Más filtros

Banco de datos
Tipo de estudio
Tipo del documento
Publication year range
2.
bioRxiv ; 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38562894

RESUMEN

Several recent studies have presented evidence that the human gene catalogue should be expanded to include thousands of short open reading frames (ORFs) appearing upstream or downstream of existing protein-coding genes, each of which would comprise an additional bicistronic transcript in humans. Here we explore an alternative hypothesis that would explain the translational and evolutionary evidence for these upstream ORFs without the need to create novel genes or bicistronic transcripts. We examined 2,199 upstream ORFs that have been proposed as high-quality candidates for novel genes, to determine if they could instead represent protein-coding exons that can be added to existing genes. We checked for the conservation of these ORFs in four recently sequenced, high-quality human genomes, and found a large majority (87.8%) to be conserved in all four as expected. We then looked for splicing evidence that would connect each upstream ORF to the downstream protein-coding gene at the same locus, thus creating a novel splicing variant using the upstream ORF as its first exon. These protein coding exon candidates were further evaluated using protein structure predictions of the protein sequences that included the proposed new exons. We determined that 582 out of 2,199 upstream ORFs have strong evidence that they can form protein coding exons that are part of an existing gene, and that the resulting protein is predicted to have similar or better structural quality than the currently annotated isoform.

3.
bioRxiv ; 2024 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-38746259

RESUMEN

The rapid growth in the number of sequenced genomes makes it possible to search for the appearance of entirely new introns in the human lineage. In this study, we compared the genomic sequences for 19,120 human protein-coding genes to a collection of 3493 vertebrate genomes, mapping the patterns of intron alignments onto a phylogenetic tree. This mapping allowed us to trace many intron gain events to precise locations in the tree, corresponding to distinct points in evolutionary history. We discovered 584 intron gain events, all of them relatively recent, in 514 distinct human genes. Among these events, we explored the hypothesis that intronization was the mechanism responsible for intron gain. Intronization events were identified by locating instances where human introns correspond to exonic sequences in homologous vertebrate genes. Although apparently rare, we found three compelling cases of intronization, and for each of those we compared the human protein sequence and structure to homologous genes that lack the introns.

4.
bioRxiv ; 2024 May 18.
Artículo en Inglés | MEDLINE | ID: mdl-38798674

RESUMEN

Evaluating the accuracy of protein-coding sequences in genome annotations is a challenging problem for which there is no broadly applicable solution. In this manuscript we introduce PSAURON (Protein Sequence Assessment Using a Reference ORF Network), a novel software tool developed to assess the quality of protein-coding gene annotations. Utilizing a machine learning model trained on a diverse dataset from over 1000 plant and animal genomes, PSAURON assigns a score to coding DNA or protein sequence that reflects the likelihood that the sequence is a genuine protein coding region. PSAURON scores can be used for genome-wide protein annotation assessment as well as the rapid identification of potentially spurious annotated proteins. Validation against established benchmarks demonstrates PSAURON's effectiveness and correlation with recognized measures of protein quality, highlighting its potential use as a general-purpose method to evaluate gene annotation. PSAURON is open source and freely available at https://github.com/salzberg-lab/PSAURON . One-Sentence Summary: PSAURON is a machine learning-based tool for rapid assessment of protein coding gene annotation.

5.
Cell Rep Methods ; 4(3): 100736, 2024 Mar 25.
Artículo en Inglés | MEDLINE | ID: mdl-38508189

RESUMEN

Differential transcript usage (DTU) plays a crucial role in determining how gene expression differs among cells, tissues, and developmental stages, contributing to the complexity and diversity of biological systems. In abnormal cells, it can also lead to deficiencies in protein function and underpin disease pathogenesis. Analyzing DTU via RNA sequencing (RNA-seq) data is vital, but the genetic heterogeneity in populations with complex diseases presents an intricate challenge due to diverse causal events and undetermined subtypes. Although the majority of common diseases in humans are categorized as complex, state-of-the-art DTU analysis methods often overlook this heterogeneity in their models. We therefore developed SPIT, a statistical tool that identifies predominant subgroups in transcript usage within a population along with their distinctive sets of DTU events. This study provides comprehensive assessments of SPIT's methodology and applies it to analyze brain samples from individuals with schizophrenia, revealing previously unreported DTU events in six candidate genes.


Asunto(s)
Perfilación de la Expresión Génica , ARN , Humanos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN
6.
bioRxiv ; 2024 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-39071384

RESUMEN

In recent years, a growing number of publications have reported the presence of microbial species in human tumors and of mixtures of microbes that appear to be highly specific to different cancer types. Our recent re-analysis of data from three cancer types revealed that technical errors have caused erroneous reports of numerous microbial species reportedly found in sequencing data from The Cancer Genome Atlas (TCGA) project. Here we have expanded our analysis to cover all 5,734 whole-genome sequencing (WGS) data sets currently available from The Cancer Genome Atlas (TCGA) project, covering 25 distinct types of cancer. We analyzed the microbial content using updated computational methods and databases, and compared our results to those from two major recent studies that focused on bacteria, viruses, and fungi in cancer. Our results expand upon and reinforce our recent findings, which showed that the presence of microbes is far smaller than had been previously reported, and that most species identified in TCGA data are either not present at all, or are known contaminants rather than microbes residing within tumors. As part of this expanded analysis, and to help others avoid being misled by flawed data, we have released a dataset that contains detailed read counts for bacteria, viruses, archaea, and fungi detected in all 5,734 TCGA samples, which can serve as a public reference for future investigations.

7.
J Clin Med ; 13(12)2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38929928

RESUMEN

Objectives: This study aims to assess the presence of pathogenic microorganisms in the corneal epithelial layer of keratoconus patients. Methods: DNA was extracted from corneal epithelial samples procured from ten individual keratoconus eyes and three healthy controls. Metagenomic next-generation sequencing (mNGS) was performed to detect ocular microbiota using an agnostic approach. Results: Metagenomic sequencing revealed a low microbial read count in corneal epithelial samples derived from both keratoconus eyes (average: 530) and controls (average: 622) without a statistically significant difference (p = 0.29). Proteobacteria were the predominant phylum in both keratoconus and control samples (relative abundance: 72% versus 79%, respectively). Conclusions: The overall low microbial read count and the lack of difference in the relative abundance of different microbial species between keratoconus and control samples do not support the hypothesis that a chronic corneal infection is implicated in the pathogenesis of keratoconus. These findings do not rule out the possibility that an acute infection may be involved in the disease process as an initiating event.

8.
bioRxiv ; 2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38798552

RESUMEN

As the number and variety of assembled genomes continues to grow, the number of annotated genomes is falling behind, particularly for eukaryotes. DNA-based mapping tools help to address this challenge, but they are only able to transfer annotation between closely-related species. Here we introduce LiftOn, a homology-based software tool that integrates DNA and protein alignments to enhance the accuracy of genome-scale annotation and to allow mapping between relatively distant species. LiftOn's protein-centric algorithm considers both types of alignments, chooses optimal open reading frames, resolves overlapping gene loci, and finds additional gene copies where they exist. LiftOn can reliably transfer annotation between genomes representing members of the same species, as we demonstrate on human, mouse, honey bee, rice, and Arabidopsis thaliana. It can further map annotation effectively across species pairs as far apart as mouse and rat or Drosophila melanogaster and D. erecta.

9.
G3 (Bethesda) ; 14(8)2024 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-38900914

RESUMEN

Stony coral tissue loss disease (SCTLD) has devastated coral reefs off the coast of Florida and continues to spread throughout the Caribbean. Although a number of bacterial taxa have consistently been associated with SCTLD, no pathogen has been definitively implicated in the etiology of SCTLD. Previous studies have predominantly focused on the prokaryotic community through 16S rRNA sequencing of healthy and affected tissues. Here, we provide a different analytical approach by applying a bioinformatics pipeline to publicly available metagenomic sequencing samples of SCTLD lesions and healthy tissues from 4 stony coral species. To compensate for the lack of coral reference genomes, we used data from apparently healthy coral samples to approximate a host genome and healthy microbiome reference. These reads were then used as a reference to which we matched and removed reads from diseased lesion tissue samples, and the remaining reads associated only with disease lesions were taxonomically classified at the DNA and protein levels. For DNA classifications, we used a pathogen identification protocol originally designed to identify pathogens in human tissue samples, and for protein classifications, we used a fast protein sequence aligner. To assess the utility of our pipeline, a species-level analysis of a candidate genus, Vibrio, was used to demonstrate the pipeline's effectiveness. Our approach revealed both complementary and unique coral microbiome members compared with a prior metagenome analysis of the same dataset.


Asunto(s)
Antozoos , Metagenómica , Antozoos/microbiología , Animales , Metagenómica/métodos , Metagenoma , Biología Computacional/métodos , Microbiota/genética , Arrecifes de Coral
10.
bioRxiv ; 2024 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-38260425

RESUMEN

Stony coral tissue loss disease (SCTLD) has devastated coral reefs off the coast of Florida and continues to spread throughout the Caribbean. Although a number of bacterial taxa have consistently been associated with SCTLD, no pathogen has been definitively implicated in the etiology of SCTLD. Previous studies have predominantly focused on the prokaryotic community through 16S rRNA sequencing of healthy and affected tissues. Here, we provide a different analytical approach by applying a bioinformatics pipeline to publicly available metagenomic sequencing samples of SCTLD lesions and healthy tissues from four stony coral species. To compensate for the lack of coral reference genomes, we used data from apparently healthy coral samples to approximate a host genome and healthy microbiome reference. These reads were then used as a reference to which we matched and removed reads from diseased lesion tissue samples, and the remaining reads associated only with disease lesions were taxonomically classified at the DNA and protein levels. For DNA classifications, we used a pathogen identification protocol originally designed to identify pathogens in human tissue samples, and for protein classifications, we used a fast protein sequence aligner. To assess the utility of our pipeline, a species-level analysis of a candidate genus, Vibrio, was used to demonstrate the pipeline's effectiveness. Our approach revealed both complementary and unique coral microbiome members compared to a prior metagenome analysis of the same dataset.

11.
G3 (Bethesda) ; 14(5)2024 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-38526344

RESUMEN

Whitebark pine (WBP, Pinus albicaulis) is a white pine of subalpine regions in the Western contiguous United States and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola) and additional threats from mountain pine beetle (Dendroctonus ponderosae), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality. Genomic technologies can contribute to a faster, more cost-effective approach to the traditional practices of identifying disease-resistant, climate-adapted seed sources for restoration. With deep-coverage Illumina short reads of haploid megagametophyte tissue and Oxford Nanopore long reads of diploid needle tissue, followed by a hybrid, multistep assembly approach, we produced a final assembly containing 27.6 Gb of sequence in 92,740 contigs (N50 537,007 bp) and 34,716 scaffolds (N50 2.0 Gb). Approximately 87.2% (24.0 Gb) of total sequence was placed on the 12 WBP chromosomes. Annotation yielded 25,362 protein-coding genes, and over 77% of the genome was characterized as repeats. WBP has demonstrated the greatest variation in resistance to WPBR among the North American white pines. Candidate genes for quantitative resistance include disease resistance genes known as nucleotide-binding leucine-rich repeat receptors (NLRs). A combination of protein domain alignments and direct genome scanning was employed to fully describe the 3 subclasses of NLRs. Our high-quality reference sequence and annotation provide a marked improvement in NLR identification compared to previous assessments that leveraged de novo-assembled transcriptomes.


Asunto(s)
Genoma de Planta , Anotación de Secuencia Molecular , Pinus , Pinus/genética , Pinus/parasitología , Genómica/métodos , Especies en Peligro de Extinción , Secuenciación de Nucleótidos de Alto Rendimiento
12.
Nat Comput Sci ; 3(8): 700-708, 2023 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38098813

RESUMEN

ORFanage is a system designed to assign open reading frames (ORFs) to known and novel gene transcripts while maximizing similarity to annotated proteins. The primary intended use of ORFanage is the identification of ORFs in the assembled results of RNA sequencing experiments, a capability that most transcriptome assembly methods do not have. Our experiments demonstrate how ORFanage can be used to find novel protein variants in RNA-seq datasets, and to improve the annotations of ORFs in tens of thousands of transcript models in the human annotation databases. Through its implementation of a highly accurate and efficient pseudo-alignment algorithm, ORFanage is substantially faster than other ORF annotation methods, enabling its application to very large datasets. When used to analyze transcriptome assemblies, ORFanage can aid in the separation of signal from transcriptional noise and the identification of likely functional transcript variants, ultimately advancing our understanding of biology and medicine.

SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda