Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 3.688
Filtrar
Más filtros

Intervalo de año de publicación
1.
Cell ; 186(9): 2018-2034.e21, 2023 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-37080200

RESUMEN

Functional genomic strategies have become fundamental for annotating gene function and regulatory networks. Here, we combined functional genomics with proteomics by quantifying protein abundances in a genome-scale knockout library in Saccharomyces cerevisiae, using data-independent acquisition mass spectrometry. We find that global protein expression is driven by a complex interplay of (1) general biological properties, including translation rate, protein turnover, the formation of protein complexes, growth rate, and genome architecture, followed by (2) functional properties, such as the connectivity of a protein in genetic, metabolic, and physical interaction networks. Moreover, we show that functional proteomics complements current gene annotation strategies through the assessment of proteome profile similarity, protein covariation, and reverse proteome profiling. Thus, our study reveals principles that govern protein expression and provides a genome-spanning resource for functional annotation.


Asunto(s)
Proteoma , Proteómica , Proteómica/métodos , Proteoma/metabolismo , Genómica/métodos , Genoma , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
2.
Cell ; 185(21): 4023-4037.e18, 2022 10 13.
Artículo en Inglés | MEDLINE | ID: mdl-36174579

RESUMEN

High-throughput RNA sequencing offers broad opportunities to explore the Earth RNA virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million RNA virus contigs. Analysis of >330,000 RNA-dependent RNA polymerases (RdRPs) shows that this expansion corresponds to a 5-fold increase of the known RNA virus diversity. Gene content analysis revealed multiple protein domains previously not found in RNA viruses and implicated in virus-host interactions. Extended RdRP phylogeny supports the monophyly of the five established phyla and reveals two putative additional bacteriophage phyla and numerous putative additional classes and orders. The dramatically expanded phylum Lenarviricota, consisting of bacterial and related eukaryotic viruses, now accounts for a third of the RNA virome. Identification of CRISPR spacer matches and bacteriolytic proteins suggests that subsets of picobirnaviruses and partitiviruses, previously associated with eukaryotes, infect prokaryotic hosts.


Asunto(s)
Bacteriófagos , Virus ARN , Bacteriófagos/genética , ARN Polimerasas Dirigidas por ADN/genética , Genoma Viral , Filogenia , ARN , Virus ARN/genética , ARN Polimerasa Dependiente del ARN/genética , Viroma
3.
Cell ; 178(5): 1245-1259.e14, 2019 08 22.
Artículo en Inglés | MEDLINE | ID: mdl-31402174

RESUMEN

Small proteins are traditionally overlooked due to computational and experimental difficulties in detecting them. To systematically identify small proteins, we carried out a comparative genomics study on 1,773 human-associated metagenomes from four different body sites. We describe >4,000 conserved protein families, the majority of which are novel; ∼30% of these protein families are predicted to be secreted or transmembrane. Over 90% of the small protein families have no known domain and almost half are not represented in reference genomes. We identify putative housekeeping, mammalian-specific, defense-related, and protein families that are likely to be horizontally transferred. We provide evidence of transcription and translation for a subset of these families. Our study suggests that small proteins are highly abundant and those of the human microbiome, in particular, may perform diverse functions that have not been previously reported.


Asunto(s)
Microbiota , Proteínas/metabolismo , Secuencia de Aminoácidos , Comunicación Celular , Interacciones Huésped-Patógeno , Humanos , Metagenoma , Sistemas de Lectura Abierta/genética , Proteínas/química , Proteínas Ribosómicas/química , Proteínas Ribosómicas/metabolismo , Alineación de Secuencia
4.
Cell ; 173(4): 1031-1044.e13, 2018 05 03.
Artículo en Inglés | MEDLINE | ID: mdl-29727662

RESUMEN

Full understanding of eukaryotic transcriptomes and how they respond to different conditions requires deep knowledge of all sites of intron excision. Although RNA sequencing (RNA-seq) provides much of this information, the low abundance of many spliced transcripts (often due to their rapid cytoplasmic decay) limits the ability of RNA-seq alone to reveal the full repertoire of spliced species. Here, we present "spliceosome profiling," a strategy based on deep sequencing of RNAs co-purifying with late-stage spliceosomes. Spliceosome profiling allows for unambiguous mapping of intron ends to single-nucleotide resolution and branchpoint identification at unprecedented depths. Our data reveal hundreds of new introns in S. pombe and numerous others that were previously misannotated. By providing a means to directly interrogate sites of spliceosome assembly and catalysis genome-wide, spliceosome profiling promises to transform our understanding of RNA processing in the nucleus, much as ribosome profiling has transformed our understanding mRNA translation in the cytoplasm.


Asunto(s)
Schizosaccharomyces/genética , Empalmosomas/metabolismo , Transcriptoma , Algoritmos , Intrones , Empalme del ARN , ARN de Hongos/metabolismo , Ribonucleoproteínas/metabolismo , Schizosaccharomyces/metabolismo , Proteínas de Schizosaccharomyces pombe/metabolismo , Análisis de Secuencia de ARN , Sitio de Iniciación de la Transcripción
5.
Cell ; 167(2): 553-565.e12, 2016 Oct 06.
Artículo en Inglés | MEDLINE | ID: mdl-27693354

RESUMEN

Genome-metabolism interactions enable cell growth. To probe the extent of these interactions and delineate their functional contributions, we quantified the Saccharomyces amino acid metabolome and its response to systematic gene deletion. Over one-third of coding genes, in particular those important for chromatin dynamics, translation, and transport, contribute to biosynthetic metabolism. Specific amino acid signatures characterize genes of similar function. This enabled us to exploit functional metabolomics to connect metabolic regulators to their effectors, as exemplified by TORC1, whose inhibition in exponentially growing cells is shown to match an interruption in endomembrane transport. Providing orthogonal information compared to physical and genetic interaction networks, metabolomic signatures cluster more than half of the so far uncharacterized yeast genes and provide functional annotation for them. A major part of coding genes is therefore participating in gene-metabolism interactions that expose the metabolism regulatory network and enable access to an underexplored space in gene function.


Asunto(s)
Aminoácidos/biosíntesis , Metaboloma , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Factores de Transcripción/metabolismo , Aminoácidos/genética , Cromatina/metabolismo , Eliminación de Gen , Regulación Fúngica de la Expresión Génica , Redes Reguladoras de Genes , Metaboloma/genética , Metabolómica/métodos , Familia de Multigenes , Fosfatidilinositol 3-Quinasas/genética , Fosfatidilinositol 3-Quinasas/metabolismo , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Factores de Transcripción/genética , Transcripción Genética
6.
Am J Hum Genet ; 111(7): 1405-1419, 2024 07 11.
Artículo en Inglés | MEDLINE | ID: mdl-38906146

RESUMEN

Genome-wide association studies (GWASs) have identified numerous lung cancer risk-associated loci. However, decoding molecular mechanisms of these associations is challenging since most of these genetic variants are non-protein-coding with unknown function. Here, we implemented massively parallel reporter assays (MPRAs) to simultaneously measure the allelic transcriptional activity of risk-associated variants. We tested 2,245 variants at 42 loci from 3 recent GWASs in East Asian and European populations in the context of two major lung cancer histological types and exposure to benzo(a)pyrene. This MPRA approach identified one or more variants (median 11 variants) with significant effects on transcriptional activity at 88% of GWAS loci. Multimodal integration of lung-specific epigenomic data demonstrated that 63% of the loci harbored multiple potentially functional variants in linkage disequilibrium. While 22% of the significant variants showed allelic effects in both A549 (adenocarcinoma) and H520 (squamous cell carcinoma) cell lines, a subset of the functional variants displayed a significant cell-type interaction. Transcription factor analyses nominated potential regulators of the functional variants, including those with cell-type-specific expression and those predicted to bind multiple potentially functional variants across the GWAS loci. Linking functional variants to target genes based on four complementary approaches identified candidate susceptibility genes, including those affecting lung cancer cell growth. CRISPR interference of the top functional variant at 20q13.33 validated variant-to-gene connections, including RTEL1, SOX18, and ARFRP1. Our data provide a comprehensive functional analysis of lung cancer GWAS loci and help elucidate the molecular basis of heterogeneity and polygenicity underlying lung cancer susceptibility.


Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Neoplasias Pulmonares , Polimorfismo de Nucleótido Simple , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/patología , Desequilibrio de Ligamiento , Herencia Multifactorial/genética , Línea Celular Tumoral , Alelos , Células A549
7.
Trends Biochem Sci ; 47(9): 785-794, 2022 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-35430135

RESUMEN

Current tools to annotate protein function have failed to keep pace with the speed of DNA sequencing and exponentially growing number of proteins of unknown function (PUFs). A major contributing factor to this mismatch is the historical lack of high-throughput methods to experimentally determine biochemical activity. Activity-based methods, such as activity-based metabolite and protein profiling, are emerging as new approaches for unbiased, global, biochemical annotation of protein function. In this review, we highlight recent experimental, activity-based approaches that offer new opportunities to determine protein function in a biologically agnostic and systems-level manner.

8.
Trends Genet ; 39(9): 686-702, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37365103

RESUMEN

Metatranscriptomics refers to the analysis of the collective microbial transcriptome of a sample. Its increased utilization for the characterization of human-associated microbial communities has enabled the discovery of many disease-state related microbial activities. Here, we review the principles of metatranscriptomics-based analysis of human-associated microbial samples. We describe strengths and weaknesses of popular sample preparation, sequencing, and bioinformatics approaches and summarize strategies for their use. We then discuss how human-associated microbial communities have recently been examined and how their characterization may change. We conclude that metatranscriptomics insights into human microbiotas under health and disease have not only expanded our knowledge on human health, but also opened avenues for rational antimicrobial drug use and disease management.


Asunto(s)
Metagenómica , Microbiota , Humanos , Microbiota/genética , Transcriptoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento
9.
Am J Hum Genet ; 110(10): 1718-1734, 2023 10 05.
Artículo en Inglés | MEDLINE | ID: mdl-37683633

RESUMEN

Genome-wide association studies of blood pressure (BP) have identified >1,000 loci, but the effector genes and biological pathways at these loci are mostly unknown. Using published association summary statistics, we conducted annotation-informed fine-mapping incorporating tissue-specific chromatin segmentation and colocalization to identify causal variants and candidate effector genes for systolic BP, diastolic BP, and pulse pressure. We observed 532 distinct signals associated with ≥2 BP traits and 84 with all three. For >20% of signals, a single variant accounted for >75% posterior probability, 65 were missense variants in known (SLC39A8, ADRB2, and DBH) and previously unreported BP candidate genes (NRIP1 and MMP14). In disease-relevant tissues, we colocalized >80 and >400 distinct signals for each BP trait with cis-eQTLs and regulatory regions from promoter capture Hi-C, respectively. Integrating mouse, human disorder, gene expression and tissue abundance data, and literature review, we provide consolidated evidence for 436 BP candidate genes for future functional validation and discover several potential drug targets.


Asunto(s)
Estudio de Asociación del Genoma Completo , Hipertensión , Humanos , Animales , Ratones , Sitios de Carácter Cuantitativo/genética , Multiómica , Predisposición Genética a la Enfermedad , Hipertensión/genética , Polimorfismo de Nucleótido Simple/genética
10.
Am J Hum Genet ; 110(11): 1903-1918, 2023 11 02.
Artículo en Inglés | MEDLINE | ID: mdl-37816352

RESUMEN

Despite whole-genome sequencing (WGS), many cases of single-gene disorders remain unsolved, impeding diagnosis and preventative care for people whose disease-causing variants escape detection. Since early WGS data analytic steps prioritize protein-coding sequences, to simultaneously prioritize variants in non-coding regions rich in transcribed and critical regulatory sequences, we developed GROFFFY, an analytic tool that integrates coordinates for regions with experimental evidence of functionality. Applied to WGS data from solved and unsolved hereditary hemorrhagic telangiectasia (HHT) recruits to the 100,000 Genomes Project, GROFFFY-based filtration reduced the mean number of variants/DNA from 4,867,167 to 21,486, without deleting disease-causal variants. In three unsolved cases (two related), GROFFFY identified ultra-rare deletions within the 3' untranslated region (UTR) of the tumor suppressor SMAD4, where germline loss-of-function alleles cause combined HHT and colonic polyposis (MIM: 175050). Sited >5.4 kb distal to coding DNA, the deletions did not modify or generate microRNA binding sites, but instead disrupted the sequence context of the final cleavage and polyadenylation site necessary for protein production: By iFoldRNA, an AAUAAA-adjacent 16-nucleotide deletion brought the cleavage site into inaccessible neighboring secondary structures, while a 4-nucleotide deletion unfolded the downstream RNA polymerase II roadblock. SMAD4 RNA expression differed to control-derived RNA from resting and cycloheximide-stressed peripheral blood mononuclear cells. Patterns predicted the mutational site for an unrelated HHT/polyposis-affected individual, where a complex insertion was subsequently identified. In conclusion, we describe a functional rare variant type that impacts regulatory systems based on RNA polyadenylation. Extension of coding sequence-focused gene panels is required to capture these variants.


Asunto(s)
Proteína Smad4 , Telangiectasia Hemorrágica Hereditaria , Humanos , Secuencia de Bases , ADN , Leucocitos Mononucleares/patología , Nucleótidos , Poliadenilación/genética , ARN , Proteína Smad4/genética , Telangiectasia Hemorrágica Hereditaria/genética , Secuenciación Completa del Genoma
11.
RNA ; 30(3): 189-199, 2024 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-38164624

RESUMEN

Aptamers have emerged as research hotspots of the next generation due to excellent performance benefits and application potentials in pharmacology, medicine, and analytical chemistry. Despite the numerous aptamer investigations, the lack of comprehensive data integration has hindered the development of computational methods for aptamers and the reuse of aptamers. A public access database named AptaDB, derived from experimentally validated data manually collected from the literature, was hence developed, integrating comprehensive aptamer-related data, which include six key components: (i) experimentally validated aptamer-target interaction information, (ii) aptamer property information, (iii) structure information of aptamer, (iv) target information, (v) experimental activity information, and (vi) algorithmically calculated similar aptamers. AptaDB currently contains 1350 experimentally validated aptamer-target interactions, 1230 binding affinity constants, 1293 aptamer sequences, and more. Compared to other aptamer databases, it contains twice the number of entries found in available databases. The collection and integration of the above information categories is unique among available aptamer databases and provides a user-friendly interface. AptaDB will also be continuously updated as aptamer research evolves. We expect that AptaDB will become a powerful source for aptamer rational design and a valuable tool for aptamer screening in the future. For access to AptaDB, please visit http://lmmd.ecust.edu.cn/aptadb/.


Asunto(s)
Aptámeros de Nucleótidos , Oligonucleótidos , Bases de Datos Factuales , Aptámeros de Nucleótidos/química , Técnica SELEX de Producción de Aptámeros
12.
Brief Bioinform ; 25(5)2024 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-39120646

RESUMEN

Cell-type annotation is a critical step in single-cell data analysis. With the development of numerous cell annotation methods, it is necessary to evaluate these methods to help researchers use them effectively. Reference datasets are essential for evaluation, but currently, the cell labels of reference datasets mainly come from computational methods, which may have computational biases and may not reflect the actual cell-type outcomes. This study first constructed an experimentally labeled immune cell-subtype single-cell dataset of the same batch and systematically evaluated 18 cell annotation methods. We assessed those methods under five scenarios, including intra-dataset validation, immune cell-subtype validation, unsupervised clustering, inter-dataset annotation, and unknown cell-type prediction. Accuracy and ARI were evaluation metrics. The results showed that SVM, scBERT, and scDeepSort were the best-performing supervised methods. Seurat was the best-performing unsupervised clustering method, but it couldn't fully fit the actual cell-type distribution. Our results indicated that experimentally labeled immune cell-subtype datasets revealed the deficiencies of unsupervised clustering methods and provided new dataset support for supervised methods.


Asunto(s)
Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Humanos , Análisis por Conglomerados , Biología Computacional/métodos , Anotación de Secuencia Molecular , RNA-Seq/métodos , Análisis de Expresión Génica de una Sola Célula
13.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38842510

RESUMEN

Accurate and comprehensive annotation of microprotein-coding small open reading frames (smORFs) is critical to our understanding of normal physiology and disease. Empirical identification of translated smORFs is carried out primarily using ribosome profiling (Ribo-seq). While effective, published Ribo-seq datasets can vary drastically in quality and different analysis tools are frequently employed. Here, we examine the impact of these factors on identifying translated smORFs. We compared five commonly used software tools that assess open reading frame translation from Ribo-seq (RibORFv0.1, RibORFv1.0, RiboCode, ORFquant, and Ribo-TISH) and found surprisingly low agreement across all tools. Only ~2% of smORFs were called translated by all five tools, and ~15% by three or more tools when assessing the same high-resolution Ribo-seq dataset. For larger annotated genes, the same analysis showed ~74% agreement across all five tools. We also found that some tools are strongly biased against low-resolution Ribo-seq data, while others are more tolerant. Analyzing Ribo-seq coverage revealed that smORFs detected by more than one tool tend to have higher translation levels and higher fractions of in-frame reads, consistent with what was observed for annotated genes. Together these results support employing multiple tools to identify the most confident microprotein-coding smORFs and choosing the tools based on the quality of the dataset and the planned downstream characterization experiments of the predicted smORFs.


Asunto(s)
Sistemas de Lectura Abierta , Programas Informáticos , Ribosomas/metabolismo , Ribosomas/genética , Anotación de Secuencia Molecular/métodos , Humanos , Biosíntesis de Proteínas , Biología Computacional/métodos , Perfilado de Ribosomas
14.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38426325

RESUMEN

Accurate metabolite annotation and false discovery rate (FDR) control remain challenging in large-scale metabolomics. Recent progress leveraging proteomics experiences and interdisciplinary inspirations has provided valuable insights. While target-decoy strategies have been introduced, generating reliable decoy libraries is difficult due to metabolite complexity. Moreover, continuous bioinformatics innovation is imperative to improve the utilization of expanding spectral resources while reducing false annotations. Here, we introduce the concept of ion entropy for metabolomics and propose two entropy-based decoy generation approaches. Assessment of public databases validates ion entropy as an effective metric to quantify ion information in massive metabolomics datasets. Our entropy-based decoy strategies outperform current representative methods in metabolomics and achieve superior FDR estimation accuracy. Analysis of 46 public datasets provides instructive recommendations for practical application.


Asunto(s)
Algoritmos , Espectrometría de Masas en Tándem , Entropía , Espectrometría de Masas en Tándem/métodos , Metabolómica/métodos , Biología Computacional/métodos , Bases de Datos de Proteínas
15.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38300514

RESUMEN

Somatic copy number alterations (SCNAs) are a predominant type of oncogenomic alterations that affect a large proportion of the genome in the majority of cancer samples. Current technologies allow high-throughput measurement of such copy number aberrations, generating results consisting of frequently large sets of SCNA segments. However, the automated annotation and integration of such data are particularly challenging because the measured signals reflect biased, relative copy number ratios. In this study, we introduce labelSeg, an algorithm designed for rapid and accurate annotation of CNA segments, with the aim of enhancing the interpretation of tumor SCNA profiles. Leveraging density-based clustering and exploiting the length-amplitude relationships of SCNA, our algorithm proficiently identifies distinct relative copy number states from individual segment profiles. Its compatibility with most CNA measurement platforms makes it suitable for large-scale integrative data analysis. We confirmed its performance on both simulated and sample-derived data from The Cancer Genome Atlas reference dataset, and we demonstrated its utility in integrating heterogeneous segment profiles from different data sources and measurement platforms. Our comparative and integrative analysis revealed common SCNA patterns in cancer and protein-coding genes with a strong correlation between SCNA and messenger RNA expression, promoting the investigation into the role of SCNA in cancer development.


Asunto(s)
Variaciones en el Número de Copia de ADN , Neoplasias , Humanos , Neoplasias/genética , Algoritmos , Análisis por Conglomerados , Análisis de Datos
16.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38752856

RESUMEN

Enhancing the reproducibility and comprehension of adaptive immune receptor repertoire sequencing (AIRR-seq) data analysis is critical for scientific progress. This study presents guidelines for reproducible AIRR-seq data analysis, and a collection of ready-to-use pipelines with comprehensive documentation. To this end, ten common pipelines were implemented using ViaFoundry, a user-friendly interface for pipeline management and automation. This is accompanied by versioned containers, documentation and archiving capabilities. The automation of pre-processing analysis steps and the ability to modify pipeline parameters according to specific research needs are emphasized. AIRR-seq data analysis is highly sensitive to varying parameters and setups; using the guidelines presented here, the ability to reproduce previously published results is demonstrated. This work promotes transparency, reproducibility, and collaboration in AIRR-seq data analysis, serving as a model for handling and documenting bioinformatics pipelines in other research domains.


Asunto(s)
Biología Computacional , Programas Informáticos , Humanos , Biología Computacional/métodos , Reproducibilidad de los Resultados , Receptores Inmunológicos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Inmunidad Adaptativa/genética , Guías como Asunto
17.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38941113

RESUMEN

This study describes the development of a resource module that is part of a learning platform named "NIGMS Sandbox for Cloud-based Learning" (https://github.com/NIGMS/NIGMS-Sandbox). The overall genesis of the Sandbox is described in the editorial NIGMS Sandbox at the beginning of this Supplement. This module delivers learning materials on de novo transcriptome assembly using Nextflow in an interactive format that uses appropriate cloud resources for data access and analysis. Cloud computing is a powerful new means by which biomedical researchers can access resources and capacity that were previously either unattainable or prohibitively expensive. To take advantage of these resources, however, the biomedical research community needs new skills and knowledge. We present here a cloud-based training module, developed in conjunction with Google Cloud, Deloitte Consulting, and the NIH STRIDES Program, that uses the biological problem of de novo transcriptome assembly to demonstrate and teach the concepts of computational workflows (using Nextflow) and cost- and resource-efficient use of Cloud services (using Google Cloud Platform). Our work highlights the reduced necessity of on-site computing resources and the accessibility of cloud-based infrastructure for bioinformatics applications.


Asunto(s)
Nube Computacional , Transcriptoma , Biología Computacional/métodos , Biología Computacional/educación , Programas Informáticos , Humanos , Perfilación de la Expresión Génica/métodos , Internet
18.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38600664

RESUMEN

Small open reading frames (smORFs) have been acknowledged to play various roles on essential biological pathways and affect human beings from diabetes to tumorigenesis. Predicting smORFs in silico is quite a prerequisite for processing the omics data. Here, we proposed the smORF-coding-potential-predicting framework, sOCP, which provides functions to construct a model for predicting novel smORFs in some species. The sOCP model constructed in human was based on in-frame features and the nucleotide bias around the start codon, and the small feature subset was proved to be competent enough and avoid overfitting problems for complicated models. It showed more advanced prediction metrics than previous methods and could correlate closely with experimental evidence in a heterogeneous dataset. The model was applied to Rattus norvegicus and exhibited satisfactory performance. We then scanned smORFs with ATG and non-ATG start codons from the human genome and generated a database containing about a million novel smORFs with coding potential. Around 72 000 smORFs are located on the lncRNA regions of the genome. The smORF-encoded peptides may be involved in biological pathways rare for canonical proteins, including glucocorticoid catabolic process and the prokaryotic defense system. Our work provides a model and database for human smORF investigation and a convenient tool for further smORF prediction in other species.


Asunto(s)
Genoma Humano , Péptidos , Animales , Humanos , Ratas , Sistemas de Lectura Abierta , Péptidos/genética , Proteínas/genética
19.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38271481

RESUMEN

Next-generation sequencing (NGS) has revolutionized the field of rare disease diagnostics. Whole exome and whole genome sequencing are now routinely used for diagnostic purposes; however, the overall diagnosis rate remains lower than expected. In this work, we review current approaches used for calling and interpretation of germline genetic variants in the human genome, and discuss the most important challenges that persist in the bioinformatic analysis of NGS data in medical genetics. We describe and attempt to quantitatively assess the remaining problems, such as the quality of the reference genome sequence, reproducible coverage biases, or variant calling accuracy in complex regions of the genome. We also discuss the prospects of switching to the complete human genome assembly or the human pan-genome and important caveats associated with such a switch. We touch on arguably the hardest problem of NGS data analysis for medical genomics, namely, the annotation of genetic variants and their subsequent interpretation. We highlight the most challenging aspects of annotation and prioritization of both coding and non-coding variants. Finally, we demonstrate the persistent prevalence of pathogenic variants in the coding genome, and outline research directions that may enhance the efficiency of NGS-based disease diagnostics.


Asunto(s)
Biología Computacional , Enfermedades Raras , Humanos , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética , Genómica , Genoma Humano , Células Germinativas , Secuenciación de Nucleótidos de Alto Rendimiento
20.
Brief Bioinform ; 25(2)2024 Jan 22.
Artículo en Inglés | MEDLINE | ID: mdl-38279653

RESUMEN

Cluster analysis is one of the most widely used exploratory methods for visualization and grouping of gene expression patterns across multiple samples or treatment groups. Although several existing online tools can annotate clusters with functional terms, there is no all-in-one webserver to effectively prioritize genes/clusters using gene essentiality as well as congruency of mRNA-protein expression. Hence, we developed CAP-RNAseq that makes possible (1) upload and clustering of bulk RNA-seq data followed by identification, annotation and network visualization of all or selected clusters; and (2) prioritization using DepMap gene essentiality and/or dependency scores as well as the degree of correlation between mRNA and protein levels of genes within an expression cluster. In addition, CAP-RNAseq has an integrated primer design tool for the prioritized genes. Herein, we showed using comparisons with the existing tools and multiple case studies that CAP-RNAseq can uniquely aid in the discovery of co-expression clusters enriched with essential genes and prioritization of novel biomarker genes that exhibit high correlations between their mRNA and protein expression levels. CAP-RNAseq is applicable to RNA-seq data from different contexts including cancer and available at http://konulabapps.bilkent.edu.tr:3838/CAPRNAseq/ and the docker image is downloadable from https://hub.docker.com/r/konulab/caprnaseq.


Asunto(s)
Proteómica , Análisis de Secuencia de ARN/métodos , RNA-Seq , ARN Mensajero/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA