Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Genome Res ; 32(4): 778-790, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35210353

RESUMEN

More than 90% of genetic variants are rare in most modern sequencing studies, such as the Alzheimer's Disease Sequencing Project (ADSP) whole-exome sequencing (WES) data. Furthermore, 54% of the rare variants in ADSP WES are singletons. However, both single variant and unit-based tests are limited in their statistical power to detect an association between rare variants and phenotypes. To best use missense rare variants and investigate their biological effect, we examine their association with phenotypes in the context of protein structures. We developed a protein structure-based approach, protein optimized kernel evaluation of missense nucleotides (POKEMON), which evaluates rare missense variants based on their spatial distribution within a protein rather than their allele frequency. The hypothesis behind this test is that the three-dimensional spatial distribution of variants within a protein structure provides functional context to power an association test. POKEMON identified three candidate genes (TREM2, SORL1, and EXOC3L4) and another suggestive gene from the ADSP WES data. For TREM2 and SORL1, two known Alzheimer's disease (AD) genes, the signal from the spatial cluster is stable even if we exclude known AD risk variants, indicating the presence of additional low-frequency risk variants within these genes. EXOC3L4 is a novel AD risk gene that has a cluster of variants primarily shared by case subjects around the Sec6 domain. This cluster is also validated in an independent replication data set and a validation data set with a larger sample size.


Asunto(s)
Enfermedad de Alzheimer , Enfermedad de Alzheimer/genética , Frecuencia de los Genes , Predisposición Genética a la Enfermedad , Humanos , Proteínas Relacionadas con Receptor de LDL/genética , Proteínas Relacionadas con Receptor de LDL/metabolismo , Proteínas de Transporte de Membrana/genética , Mutación Missense , Fenotipo , Secuenciación del Exoma
2.
BMC Genomics ; 25(1): 115, 2024 Jan 26.
Artículo en Inglés | MEDLINE | ID: mdl-38279154

RESUMEN

BACKGROUND: Short tandem repeats (STRs) are widely distributed across the human genome and are associated with numerous neurological disorders. However, the extent that STRs contribute to disease is likely under-estimated because of the challenges calling these variants in short read next generation sequencing data. Several computational tools have been developed for STR variant calling, but none fully address all of the complexities associated with this variant class. RESULTS: Here we introduce LUSTR which is designed to address some of the challenges associated with STR variant calling by enabling more flexibility in defining STR loci, allowing for customizable modules to tailor analyses, and expanding the capability to call somatic and multiallelic STR variants. LUSTR is a user-friendly and easily customizable tool for targeted or unbiased genome-wide STR variant screening that can use either predefined or novel genome builds. Using both simulated and real data sets, we demonstrated that LUSTR accurately infers germline and somatic STR expansions in individuals with and without diseases. CONCLUSIONS: LUSTR offers a powerful and user-friendly approach that allows for the identification of STR variants and can facilitate more comprehensive studies evaluating the role of pathogenic STR variants across human diseases.


Asunto(s)
Genoma Humano , Repeticiones de Microsatélite , Humanos , Repeticiones de Microsatélite/genética , Células Germinativas , Secuenciación de Nucleótidos de Alto Rendimiento
3.
Hum Mol Genet ; 31(R1): R62-R72, 2022 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-35943817

RESUMEN

Non-coding genetic variants outside of protein-coding genome regions play an important role in genetic and epigenetic regulation. It has become increasingly important to understand their roles, as non-coding variants often make up the majority of top findings of genome-wide association studies (GWAS). In addition, the growing popularity of disease-specific whole-genome sequencing (WGS) efforts expands the library of and offers unique opportunities for investigating both common and rare non-coding variants, which are typically not detected in more limited GWAS approaches. However, the sheer size and breadth of WGS data introduce additional challenges to predicting functional impacts in terms of data analysis and interpretation. This review focuses on the recent approaches developed for efficient, at-scale annotation and prioritization of non-coding variants uncovered in WGS analyses. In particular, we review the latest scalable annotation tools, databases and functional genomic resources for interpreting the variant findings from WGS based on both experimental data and in silico predictive annotations. We also review machine learning-based predictive models for variant scoring and prioritization. We conclude with a discussion of future research directions which will enhance the data and tools necessary for the effective functional analyses of variants identified by WGS to improve our understanding of disease etiology.


Asunto(s)
Epigénesis Genética , Estudio de Asociación del Genoma Completo , Secuenciación Completa del Genoma , Genómica
4.
Bioinformatics ; 39(11)2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37947320

RESUMEN

SUMMARY: Preparing functional genomic (FG) data with diverse assay types and file formats for integration into analysis workflows that interpret genome-wide association and other studies is a significant and time-consuming challenge. Here we introduce hipFG (Harmonization and Integration Pipeline for Functional Genomics), an automatically customized pipeline for efficient and scalable normalization of heterogenous FG data collections into standardized, indexed, rapidly searchable analysis-ready datasets while accounting for FG datatypes (e.g. chromatin interactions, genomic intervals, quantitative trait loci). AVAILABILITY AND IMPLEMENTATION: hipFG is freely available at https://bitbucket.org/wanglab-upenn/hipFG. A Docker container is available at https://hub.docker.com/r/wanglab/hipfg.


Asunto(s)
Estudio de Asociación del Genoma Completo , Programas Informáticos , Genómica , Cromatina , Sitios de Carácter Cuantitativo
5.
Alzheimers Dement ; 20(2): 1123-1136, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-37881831

RESUMEN

INTRODUCTION: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site Alzheimer's Genomics Database (GenomicsDB) is a public knowledge base of Alzheimer's disease (AD) genetic datasets and genomic annotations. METHODS: GenomicsDB uses a custom systems architecture to adopt and enforce rigorous standards that facilitate harmonization of AD-relevant genome-wide association study summary statistics datasets with functional annotations, including over 230 million annotated variants from the AD Sequencing Project. RESULTS: GenomicsDB generates interactive reports compiled from the harmonized datasets and annotations. These reports contextualize AD-risk associations in a broader functional genomic setting and summarize them in the context of functionally annotated genes and variants. DISCUSSION: Created to make AD-genetics knowledge more accessible to AD researchers, the GenomicsDB is designed to guide users unfamiliar with genetic data in not only exploring but also interpreting this ever-growing volume of data. Scalable and interoperable with other genomics resources using data technology standards, the GenomicsDB can serve as a central hub for research and data analysis on AD and related dementias. HIGHLIGHTS: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) offers to the public a unique, disease-centric collection of AD-relevant GWAS summary statistics datasets. Interpreting these data is challenging and requires significant bioinformatics expertise to standardize datasets and harmonize them with functional annotations on genome-wide scales. The NIAGADS Alzheimer's GenomicsDB helps overcome these challenges by providing a user-friendly public knowledge base for AD-relevant genetics that shares harmonized, annotated summary statistics datasets from the NIAGADS repository in an interpretable, easily searchable format.


Asunto(s)
Enfermedad de Alzheimer , Estados Unidos , Humanos , Enfermedad de Alzheimer/genética , Estudio de Asociación del Genoma Completo , National Institute on Aging (U.S.) , Genómica , Bases de Datos Factuales , Predisposición Genética a la Enfermedad/genética
6.
Bioinformatics ; 38(19): 4530-4536, 2022 09 30.
Artículo en Inglés | MEDLINE | ID: mdl-35980155

RESUMEN

MOTIVATION: Cell-type deconvolution of bulk tissue RNA sequencing (RNA-seq) data is an important step toward understanding the variations in cell-type composition among disease conditions. Owing to recent advances in single-cell RNA sequencing (scRNA-seq) and the availability of large amounts of bulk RNA-seq data in disease-relevant tissues, various deconvolution methods have been developed. However, the performance of existing methods heavily relies on the quality of information provided by external data sources, such as the selection of scRNA-seq data as a reference and prior biological information. RESULTS: We present the Integrated and Robust Deconvolution (InteRD) algorithm to infer cell-type proportions from target bulk RNA-seq data. Owing to the innovative use of penalized regression with a new evaluation criterion for deconvolution, InteRD has three primary advantages. First, it is able to effectively integrate deconvolution results from multiple scRNA-seq datasets. Second, InteRD calibrates estimates from reference-based deconvolution by taking into account extra biological information as priors. Third, the proposed algorithm is robust to inaccurate external information imposed in the deconvolution system. Extensive numerical evaluations and real-data applications demonstrate that InteRD yields more accurate and robust cell-type proportion estimates that agree well with known biology. AVAILABILITY AND IMPLEMENTATION: The proposed InteRD framework is implemented in R and the package is available at https://cran.r-project.org/web/packages/InteRD/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
ARN , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Análisis de Secuencia de ARN/métodos
7.
Alzheimers Dement ; 2022 Jun 30.
Artículo en Inglés | MEDLINE | ID: mdl-35770850

RESUMEN

INTRODUCTION: Variants in the tau gene (MAPT) region are associated with breast cancer in women and Alzheimer's disease (AD) among persons lacking apolipoprotein E ε4 (ε4-). METHODS: To identify novel genes associated with tau-related pathology, we conducted two genome-wide association studies (GWAS) for AD, one among 10,340 ε4- women in the Alzheimer's Disease Genetics Consortium (ADGC) and another in 31 members (22 women) of a consanguineous Hutterite kindred. RESULTS: We identified novel associations of AD with MGMT variants in the ADGC (rs12775171, odds ratio [OR] = 1.4, P = 4.9 × 10-8 ) and Hutterite (rs12256016 and rs2803456, OR = 2.0, P = 1.9 × 10-14 ) datasets. Multi-omics analyses showed that the most significant and largest number of associations among the single nucleotide polymorphisms (SNPs), DNA-methylated CpGs, MGMT expression, and AD-related neuropathological traits were observed among women. Furthermore, promoter capture Hi-C analyses revealed long-range interactions of the MGMT promoter with MGMT SNPs and CpG sites. DISCUSSION: These findings suggest that epigenetically regulated MGMT expression is involved in AD pathogenesis, especially in women.

8.
Alzheimers Dement ; 18(12): 2458-2467, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-35258170

RESUMEN

INTRODUCTION: Progranulin (GRN) mutations occur in frontotemporal lobar degeneration (FTLD) and in Alzheimer's disease (AD), often with TDP-43 pathology. METHODS: We determined the frequency of rs5848 and rare, pathogenic GRN mutations in two autopsy and one family cohort. We compared Braak stage, ß-amyloid load, hyperphosphorylated tau (PHFtau) tangle density and TDP-43 pathology in GRN carriers and non-carriers. RESULTS: Pathogenic GRN mutations were more frequent in all cohorts compared to the Genome Aggregation Database (gnomAD), but there was no evidence for association with AD. Pathogenic GRN carriers had significantly higher PHFtau tangle density adjusting for age, sex and APOE ε4 genotype. AD patients with rs5848 had higher frequencies of hippocampal sclerosis and TDP-43 deposits. Twenty-two rare, pathogenic GRN variants were observed in the family cohort. DISCUSSION: GRN mutations in clinical and neuropathological AD increase the burden of tau-related brain pathology but show no specific association with ß-amyloid load or AD.


Asunto(s)
Enfermedad de Alzheimer , Degeneración Lobar Frontotemporal , Humanos , Progranulinas/genética , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/patología , Péptidos y Proteínas de Señalización Intercelular/genética , Mutación/genética , Degeneración Lobar Frontotemporal/genética , Proteínas de Unión al ADN/genética
9.
Bioinformatics ; 36(12): 3879-3881, 2020 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-32330239

RESUMEN

SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. CONTACT: lswang@pennmedicine.upenn.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Algoritmos , Genómica , Programas Informáticos
10.
Bioinformatics ; 35(6): 1033-1039, 2019 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-30668832

RESUMEN

MOTIVATION: Small non-coding RNAs (sncRNAs, <100 nts) are highly abundant RNAs that regulate diverse and often tissue-specific cellular processes by associating with transcription factor complexes or binding to mRNAs. While thousands of sncRNA genes exist in the human genome, no single resource provides searchable, unified annotation, expression and processing information for full sncRNA transcripts and mature RNA products derived from these larger RNAs. RESULTS: Our goal is to establish a complete catalog of annotation, expression, processing, conservation, tissue-specificity and other biological features for all human sncRNA genes and mature products derived from all major RNA classes. DASHR (Database of small human non-coding RNAs) v2.0 database is the first that integrates human sncRNA gene and mature products profiles obtained from multiple RNA-seq protocols. Altogether, 185 tissues/cell types and sncRNA annotations and >800 curated experiments from ENCODE and GEO/SRA across multiple RNA-seq protocols for both GRCh38/hg38 and GRCh37/hg19 assemblies are integrated in DASHR. Moreover, DASHR is the first to contain both known and novel, previously un-annotated sncRNA loci identified by unsupervised segmentation (13 times more loci with 1 678 800 total). Additionally, DASHR v2.0 adds >3 200 000 annotations for non-small RNA genes and other genomic features (long-noncoding RNAs, mRNAs, promoters, repeats). Furthermore, DASHR v2.0 introduces an enhanced user interface, interactive experiment-by-locus table view, sncRNA locus sorting and filtering by biological features. All annotation and expression information directly downloadable and accessible as UCSC genome browser tracks. AVAILABILITY AND IMPLEMENTATION: DASHR v2.0 is freely available at https://lisanwanglab.org/DASHRv2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
ARN Pequeño no Traducido/provisión & distribución , Bases de Datos de Ácidos Nucleicos , Genómica , Humanos , ARN Largo no Codificante , Análisis de Secuencia de ARN , Programas Informáticos
11.
Bioinformatics ; 35(10): 1768-1770, 2019 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-30351394

RESUMEN

SUMMARY: We report VCPA, our SNP/Indel Variant Calling Pipeline and data management tool used for the analysis of whole genome and exome sequencing (WGS/WES) for the Alzheimer's Disease Sequencing Project. VCPA consists of two independent but linkable components: pipeline and tracking database. The pipeline, implemented using the Workflow Description Language and fully optimized for the Amazon elastic compute cloud environment, includes steps from aligning raw sequence reads to variant calling using GATK. The tracking database allows users to view job running status in real time and visualize >100 quality metrics per genome. VCPA is functionally equivalent to the CCDG/TOPMed pipeline. Users can use the pipeline and the dockerized database to process large WGS/WES datasets on Amazon cloud with minimal configuration. AVAILABILITY AND IMPLEMENTATION: VCPA is released under the MIT license and is available for academic and nonprofit use for free. The pipeline source code and step-by-step instructions are available from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (http://www.niagads.org/VCPA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Enfermedad de Alzheimer , Manejo de Datos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Programas Informáticos
12.
Nucleic Acids Res ; 46(W1): W36-W42, 2018 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-29733404

RESUMEN

The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.


Asunto(s)
Biología Computacional/tendencias , ARN Pequeño no Traducido/genética , ARN/genética , Programas Informáticos , Animales , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento/instrumentación , Humanos , Internet , Ratones , Anotación de Secuencia Molecular , Análisis de Secuencia de ARN/instrumentación , Transcriptoma/genética
13.
Alzheimers Dement ; 16(9): 1234-1247, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32715599

RESUMEN

INTRODUCTION: Altered lipid metabolism is implicated in Alzheimer's disease (AD), but the mechanisms remain obscure. Aging-related declines in circulating plasmalogens containing omega-3 fatty acids may increase AD risk by reducing plasmalogen availability. METHODS: We measured four ethanolamine plasmalogens (PlsEtns) and four closely related phosphatidylethanolamines (PtdEtns) from the Alzheimer's Disease Neuroimaging Initiative (ADNI; n = 1547 serum) and University of Pennsylvania (UPenn; n = 112 plasma) cohorts, and derived indices reflecting PlsEtn and PtdEtn metabolism: PL-PX (PlsEtns), PL/PE (PlsEtn/PtdEtn ratios), and PBV (plasmalogen biosynthesis value; a composite index). We tested associations with baseline diagnosis, cognition, and cerebrospinal fluid (CSF) AD biomarkers. RESULTS: Results revealed statistically significant negative relationships in ADNI between AD versus CN with PL-PX (P = 0.007) and PBV (P = 0.005), late mild cognitive impairment (LMCI) versus cognitively normal (CN) with PL-PX (P = 2.89 × 10-5 ) and PBV (P = 1.99 × 10-4 ), and AD versus LMCI with PL/PE (P = 1.85 × 10-4 ). In the UPenn cohort, AD versus CN diagnosis associated negatively with PL/PE (P = 0.0191) and PBV (P = 0.0296). In ADNI, cognition was negatively associated with plasmalogen indices, including Alzheimer's Disease Assessment Scale 13-item cognitive subscale (ADAS-Cog13; PL-PX: P = 3.24 × 10-6 ; PBV: P = 6.92 × 10-5 ) and Mini-Mental State Examination (MMSE; PL-PX: P = 1.28 × 10-9 ; PBV: P = 6.50 × 10-9 ). In the UPenn cohort, there was a trend toward a similar relationship of MMSE with PL/PE (P = 0.0949). In ADNI, CSF total-tau was negatively associated with PL-PX (P = 5.55 × 10-6 ) and PBV (P = 7.77 × 10-6 ). Additionally, CSF t-tau/Aß1-42 ratio was negatively associated with these same indices (PL-PX, P = 2.73 × 10-6 ; PBV, P = 4.39 × 10-6 ). In the UPenn cohort, PL/PE was negatively associated with CSF total-tau (P = 0.031) and t-tau/Aß1-42 (P = 0.021). CSF Aß1-42 was not significantly associated with any of these indices in either cohort. DISCUSSION: These data extend previous studies by showing an association of decreased plasmalogen indices with AD, mild cognitive impairment (MCI), cognition, and CSF tau. Future studies are needed to better define mechanistic relationships, and to test the effects of interventions designed to replete serum plasmalogens.


Asunto(s)
Enfermedad de Alzheimer , Pruebas Neuropsicológicas/estadística & datos numéricos , Plasmalógenos/sangre , Proteínas tau/líquido cefalorraquídeo , Anciano , Enfermedad de Alzheimer/sangre , Enfermedad de Alzheimer/diagnóstico , Biomarcadores/líquido cefalorraquídeo , Disfunción Cognitiva/líquido cefalorraquídeo , Estudios de Cohortes , Femenino , Humanos , Masculino , Neuroimagen
14.
Bioinformatics ; 34(16): 2724-2731, 2018 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-29590295

RESUMEN

Motivation: Annotation of genomic variants is an increasingly important and complex part of the analysis of sequence-based genomic analyses. Computational predictions of variant function are routinely incorporated into gene-based analyses of rare-variants, though to date most studies use limited information for assessing variant function that is often agnostic of the disease being studied. Results: In this work, we outline an annotation process motivated by the Alzheimer's Disease Sequencing Project, illustrate the impact of including tissue-specific transcript sets and sources of gene regulatory information and assess the potential impact of changing genomic builds on the annotation process. While these factors only impact a small proportion of total variant annotations (∼5%), they influence the potential analysis of a large fraction of genes (∼25%). Availability and implementation: Individual variant annotations are available via the NIAGADS GenomicsDB, at https://www.niagads.org/genomics/ tools-and-software/databases/genomics-database. Annotations are also available for bulk download at https://www.niagads.org/datasets. Annotation processing software is available at http://www.icompbio.net/resources/software-and-downloads/. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Enfermedad de Alzheimer/genética , Predisposición Genética a la Enfermedad , Anotación de Secuencia Molecular/métodos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Bases de Datos Genéticas , Genoma , Genómica , Humanos
15.
Nucleic Acids Res ; 44(D1): D216-22, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26553799

RESUMEN

Small non-coding RNAs (sncRNAs) are highly abundant RNAs, typically <100 nucleotides long, that act as key regulators of diverse cellular processes. Although thousands of sncRNA genes are known to exist in the human genome, no single database provides searchable, unified annotation, and expression information for full sncRNA transcripts and mature RNA products derived from these larger RNAs. Here, we present the Database of small human noncoding RNAs (DASHR). DASHR contains the most comprehensive information to date on human sncRNA genes and mature sncRNA products. DASHR provides a simple user interface for researchers to view sequence and secondary structure, compare expression levels, and evidence of specific processing across all sncRNA genes and mature sncRNA products in various human tissues. DASHR annotation and expression data covers all major classes of sncRNAs including microRNAs (miRNAs), Piwi-interacting (piRNAs), small nuclear, nucleolar, cytoplasmic (sn-, sno-, scRNAs, respectively), transfer (tRNAs), and ribosomal RNAs (rRNAs). Currently, DASHR (v1.0) integrates 187 smRNA high-throughput sequencing (smRNA-seq) datasets with over 2.5 billion reads and annotation data from multiple public sources. DASHR contains annotations for ∼ 48,000 human sncRNA genes and mature sncRNA products, 82% of which are expressed in one or more of the curated tissues. DASHR is available at http://lisanwanglab.org/DASHR.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , ARN Pequeño no Traducido/metabolismo , Humanos , Anotación de Secuencia Molecular , Procesamiento Postranscripcional del ARN , ARN Pequeño no Traducido/química , ARN Pequeño no Traducido/genética
16.
RNA ; 19(12): 1684-92, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-24149843

RESUMEN

RNA is often altered post-transcriptionally by the covalent modification of particular nucleotides; these modifications are known to modulate the structure and activity of their host RNAs. The recent discovery that an RNA methyl-6 adenosine demethylase (FTO) is a risk gene in obesity has brought to light the significance of RNA modifications to human biology. These noncanonical nucleotides, when converted to cDNA in the course of RNA sequencing, can produce sequence patterns that are distinguishable from simple base-calling errors. To determine whether these modifications can be detected in RNA sequencing data, we developed a method that can not only locate these modifications transcriptome-wide with single nucleotide resolution, but can also differentiate between different classes of modifications. Using small RNA-seq data we were able to detect 92% of all known human tRNA modification sites that are predicted to affect RT activity. We also found that different modifications produce distinct patterns of cDNA sequence, allowing us to differentiate between two classes of adenosine and two classes of guanine modifications with 98% and 79% accuracy, respectively. To show the robustness of this method to sample preparation and sequencing methods, as well as to organismal diversity, we applied it to a publicly available yeast data set and achieved similar levels of accuracy. We also experimentally validated two novel and one known 3-methylcytosine (3mC) sites predicted by HAMR in human tRNAs. Researchers can now use our method to identify and characterize RNA modifications using only RNA-seq data, both retrospectively and when asking questions specifically about modified RNA.


Asunto(s)
Anotación de Secuencia Molecular/métodos , Procesamiento Postranscripcional del ARN , ARN de Transferencia/genética , Programas Informáticos , Femenino , Células HEK293 , Humanos , Masculino , ARN/genética , ARN/metabolismo , ARN de Transferencia/metabolismo , Saccharomyces cerevisiae/genética , Alineación de Secuencia , Análisis de Secuencia de ARN
18.
Methods ; 67(1): 28-35, 2014 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-24145223

RESUMEN

Recent advances in high-throughput sequencing allow researchers to examine the transcriptome in more detail than ever before. Using a method known as high-throughput small RNA-sequencing, we can now profile the expression of small regulatory RNAs such as microRNAs and small interfering RNAs (siRNAs) with a great deal of sensitivity. However, there are many other types of small RNAs (<50nt) present in the cell, including fragments derived from snoRNAs (small nucleolar RNAs), snRNAs (small nuclear RNAs), scRNAs (small cytoplasmic RNAs), tRNAs (transfer RNAs), and transposon-derived RNAs. Here, we present a user's guide for CoRAL (Classification of RNAs by Analysis of Length), a computational method for discriminating between different classes of RNA using high-throughput small RNA-sequencing data. Not only can CoRAL distinguish between RNA classes with high accuracy, but it also uses features that are relevant to small RNA biogenesis pathways. By doing so, CoRAL can give biologists a glimpse into the characteristics of different RNA processing pathways and how these might differ between tissue types, biological conditions, or even different species. CoRAL is available at http://wanglab.pcbi.upenn.edu/coral/.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , ARN Pequeño no Traducido/clasificación , Análisis de Secuencia de ARN , Algoritmos , Animales , Inteligencia Artificial , Secuencia de Bases , Árboles de Decisión , Entropía , Humanos , Secuencias Invertidas Repetidas , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Procesamiento Postranscripcional del ARN , ARN Pequeño no Traducido/genética
19.
Nucleic Acids Res ; 41(14): e137, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23700308

RESUMEN

The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms.


Asunto(s)
Inteligencia Artificial , ARN Pequeño no Traducido/clasificación , Análisis de Secuencia de ARN , Algoritmos , Clasificación/métodos , Humanos , ARN Pequeño no Traducido/química
20.
medRxiv ; 2024 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-38293024

RESUMEN

The prevalence of dementia among South Asians across India is approximately 7.4% in those 60 years and older, yet little is known about genetic risk factors for dementia in this population. Most known risk loci for Alzheimer's disease (AD) have been identified from studies conducted in European Ancestry (EA) but are unknown in South Asians. Using whole-genome sequence data from 2680 participants from the Diagnostic Assessment of Dementia for the Longitudinal Aging Study of India (LASI-DAD), we performed a gene-based analysis of 84 genes previously associated with AD in EA. We investigated associations with the Hindi Mental State Examination (HMSE) score and factor scores for general cognitive function and five cognitive domains. For each gene, we examined missense/loss-of-function (LoF) variants and brain-specific promoter/enhancer variants, separately, both with and without incorporating additional annotation weights (e.g., deleteriousness, conservation scores) using the variant-Set Test for Association using Annotation infoRmation (STAAR). In the missense/LoF analysis without annotation weights and controlling for age, sex, state/territory, and genetic ancestry, three genes had an association with at least one measure of cognitive function (FDR q<0.1). APOE was associated with four measures of cognitive function, PICALM was associated with HMSE score, and TSPOAP1 was associated with executive function. The most strongly associated variants in each gene were rs429358 (APOE ε4), rs779406084 (PICALM), and rs9913145 (TSPOAP1). rs779406084 is a rare missense mutation that is more prevalent in LASI-DAD than in EA (minor allele frequency=0.075% vs. 0.0015%); the other two are common variants. No genes in the brain-specific promoter/enhancer analysis met criteria for significance. Results with and without annotation weights were similar. Missense/LoF variants in some genes previously associated with AD in EA are associated with measures of cognitive function in South Asians from India. Analyzing genome sequence data allows identification of potential novel causal variants enriched in South Asians.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA