Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Genome Res ; 32(4): 778-790, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-35210353

RESUMO

More than 90% of genetic variants are rare in most modern sequencing studies, such as the Alzheimer's Disease Sequencing Project (ADSP) whole-exome sequencing (WES) data. Furthermore, 54% of the rare variants in ADSP WES are singletons. However, both single variant and unit-based tests are limited in their statistical power to detect an association between rare variants and phenotypes. To best use missense rare variants and investigate their biological effect, we examine their association with phenotypes in the context of protein structures. We developed a protein structure-based approach, protein optimized kernel evaluation of missense nucleotides (POKEMON), which evaluates rare missense variants based on their spatial distribution within a protein rather than their allele frequency. The hypothesis behind this test is that the three-dimensional spatial distribution of variants within a protein structure provides functional context to power an association test. POKEMON identified three candidate genes (TREM2, SORL1, and EXOC3L4) and another suggestive gene from the ADSP WES data. For TREM2 and SORL1, two known Alzheimer's disease (AD) genes, the signal from the spatial cluster is stable even if we exclude known AD risk variants, indicating the presence of additional low-frequency risk variants within these genes. EXOC3L4 is a novel AD risk gene that has a cluster of variants primarily shared by case subjects around the Sec6 domain. This cluster is also validated in an independent replication data set and a validation data set with a larger sample size.


Assuntos
Doença de Alzheimer , Doença de Alzheimer/genética , Frequência do Gene , Predisposição Genética para Doença , Humanos , Proteínas Relacionadas a Receptor de LDL/genética , Proteínas Relacionadas a Receptor de LDL/metabolismo , Proteínas de Membrana Transportadoras/genética , Mutação de Sentido Incorreto , Fenótipo , Sequenciamento do Exoma
2.
BMC Genomics ; 25(1): 115, 2024 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-38279154

RESUMO

BACKGROUND: Short tandem repeats (STRs) are widely distributed across the human genome and are associated with numerous neurological disorders. However, the extent that STRs contribute to disease is likely under-estimated because of the challenges calling these variants in short read next generation sequencing data. Several computational tools have been developed for STR variant calling, but none fully address all of the complexities associated with this variant class. RESULTS: Here we introduce LUSTR which is designed to address some of the challenges associated with STR variant calling by enabling more flexibility in defining STR loci, allowing for customizable modules to tailor analyses, and expanding the capability to call somatic and multiallelic STR variants. LUSTR is a user-friendly and easily customizable tool for targeted or unbiased genome-wide STR variant screening that can use either predefined or novel genome builds. Using both simulated and real data sets, we demonstrated that LUSTR accurately infers germline and somatic STR expansions in individuals with and without diseases. CONCLUSIONS: LUSTR offers a powerful and user-friendly approach that allows for the identification of STR variants and can facilitate more comprehensive studies evaluating the role of pathogenic STR variants across human diseases.


Assuntos
Genoma Humano , Repetições de Microssatélites , Humanos , Repetições de Microssatélites/genética , Células Germinativas , Sequenciamento de Nucleotídeos em Larga Escala
3.
Hum Mol Genet ; 31(R1): R62-R72, 2022 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-35943817

RESUMO

Non-coding genetic variants outside of protein-coding genome regions play an important role in genetic and epigenetic regulation. It has become increasingly important to understand their roles, as non-coding variants often make up the majority of top findings of genome-wide association studies (GWAS). In addition, the growing popularity of disease-specific whole-genome sequencing (WGS) efforts expands the library of and offers unique opportunities for investigating both common and rare non-coding variants, which are typically not detected in more limited GWAS approaches. However, the sheer size and breadth of WGS data introduce additional challenges to predicting functional impacts in terms of data analysis and interpretation. This review focuses on the recent approaches developed for efficient, at-scale annotation and prioritization of non-coding variants uncovered in WGS analyses. In particular, we review the latest scalable annotation tools, databases and functional genomic resources for interpreting the variant findings from WGS based on both experimental data and in silico predictive annotations. We also review machine learning-based predictive models for variant scoring and prioritization. We conclude with a discussion of future research directions which will enhance the data and tools necessary for the effective functional analyses of variants identified by WGS to improve our understanding of disease etiology.


Assuntos
Epigênese Genética , Estudo de Associação Genômica Ampla , Sequenciamento Completo do Genoma , Genômica
4.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37947320

RESUMO

SUMMARY: Preparing functional genomic (FG) data with diverse assay types and file formats for integration into analysis workflows that interpret genome-wide association and other studies is a significant and time-consuming challenge. Here we introduce hipFG (Harmonization and Integration Pipeline for Functional Genomics), an automatically customized pipeline for efficient and scalable normalization of heterogenous FG data collections into standardized, indexed, rapidly searchable analysis-ready datasets while accounting for FG datatypes (e.g. chromatin interactions, genomic intervals, quantitative trait loci). AVAILABILITY AND IMPLEMENTATION: hipFG is freely available at https://bitbucket.org/wanglab-upenn/hipFG. A Docker container is available at https://hub.docker.com/r/wanglab/hipfg.


Assuntos
Estudo de Associação Genômica Ampla , Software , Genômica , Cromatina , Locos de Características Quantitativas
5.
Alzheimers Dement ; 20(2): 1123-1136, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37881831

RESUMO

INTRODUCTION: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site Alzheimer's Genomics Database (GenomicsDB) is a public knowledge base of Alzheimer's disease (AD) genetic datasets and genomic annotations. METHODS: GenomicsDB uses a custom systems architecture to adopt and enforce rigorous standards that facilitate harmonization of AD-relevant genome-wide association study summary statistics datasets with functional annotations, including over 230 million annotated variants from the AD Sequencing Project. RESULTS: GenomicsDB generates interactive reports compiled from the harmonized datasets and annotations. These reports contextualize AD-risk associations in a broader functional genomic setting and summarize them in the context of functionally annotated genes and variants. DISCUSSION: Created to make AD-genetics knowledge more accessible to AD researchers, the GenomicsDB is designed to guide users unfamiliar with genetic data in not only exploring but also interpreting this ever-growing volume of data. Scalable and interoperable with other genomics resources using data technology standards, the GenomicsDB can serve as a central hub for research and data analysis on AD and related dementias. HIGHLIGHTS: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) offers to the public a unique, disease-centric collection of AD-relevant GWAS summary statistics datasets. Interpreting these data is challenging and requires significant bioinformatics expertise to standardize datasets and harmonize them with functional annotations on genome-wide scales. The NIAGADS Alzheimer's GenomicsDB helps overcome these challenges by providing a user-friendly public knowledge base for AD-relevant genetics that shares harmonized, annotated summary statistics datasets from the NIAGADS repository in an interpretable, easily searchable format.


Assuntos
Doença de Alzheimer , Estados Unidos , Humanos , Doença de Alzheimer/genética , Estudo de Associação Genômica Ampla , National Institute on Aging (U.S.) , Genômica , Bases de Dados Factuais , Predisposição Genética para Doença/genética
6.
Bioinformatics ; 38(19): 4530-4536, 2022 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-35980155

RESUMO

MOTIVATION: Cell-type deconvolution of bulk tissue RNA sequencing (RNA-seq) data is an important step toward understanding the variations in cell-type composition among disease conditions. Owing to recent advances in single-cell RNA sequencing (scRNA-seq) and the availability of large amounts of bulk RNA-seq data in disease-relevant tissues, various deconvolution methods have been developed. However, the performance of existing methods heavily relies on the quality of information provided by external data sources, such as the selection of scRNA-seq data as a reference and prior biological information. RESULTS: We present the Integrated and Robust Deconvolution (InteRD) algorithm to infer cell-type proportions from target bulk RNA-seq data. Owing to the innovative use of penalized regression with a new evaluation criterion for deconvolution, InteRD has three primary advantages. First, it is able to effectively integrate deconvolution results from multiple scRNA-seq datasets. Second, InteRD calibrates estimates from reference-based deconvolution by taking into account extra biological information as priors. Third, the proposed algorithm is robust to inaccurate external information imposed in the deconvolution system. Extensive numerical evaluations and real-data applications demonstrate that InteRD yields more accurate and robust cell-type proportion estimates that agree well with known biology. AVAILABILITY AND IMPLEMENTATION: The proposed InteRD framework is implemented in R and the package is available at https://cran.r-project.org/web/packages/InteRD/index.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
RNA , Análise de Célula Única , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodos , Software , Análise de Sequência de RNA/métodos
7.
Alzheimers Dement ; 2022 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-35770850

RESUMO

INTRODUCTION: Variants in the tau gene (MAPT) region are associated with breast cancer in women and Alzheimer's disease (AD) among persons lacking apolipoprotein E ε4 (ε4-). METHODS: To identify novel genes associated with tau-related pathology, we conducted two genome-wide association studies (GWAS) for AD, one among 10,340 ε4- women in the Alzheimer's Disease Genetics Consortium (ADGC) and another in 31 members (22 women) of a consanguineous Hutterite kindred. RESULTS: We identified novel associations of AD with MGMT variants in the ADGC (rs12775171, odds ratio [OR] = 1.4, P = 4.9 × 10-8 ) and Hutterite (rs12256016 and rs2803456, OR = 2.0, P = 1.9 × 10-14 ) datasets. Multi-omics analyses showed that the most significant and largest number of associations among the single nucleotide polymorphisms (SNPs), DNA-methylated CpGs, MGMT expression, and AD-related neuropathological traits were observed among women. Furthermore, promoter capture Hi-C analyses revealed long-range interactions of the MGMT promoter with MGMT SNPs and CpG sites. DISCUSSION: These findings suggest that epigenetically regulated MGMT expression is involved in AD pathogenesis, especially in women.

8.
Alzheimers Dement ; 18(12): 2458-2467, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-35258170

RESUMO

INTRODUCTION: Progranulin (GRN) mutations occur in frontotemporal lobar degeneration (FTLD) and in Alzheimer's disease (AD), often with TDP-43 pathology. METHODS: We determined the frequency of rs5848 and rare, pathogenic GRN mutations in two autopsy and one family cohort. We compared Braak stage, ß-amyloid load, hyperphosphorylated tau (PHFtau) tangle density and TDP-43 pathology in GRN carriers and non-carriers. RESULTS: Pathogenic GRN mutations were more frequent in all cohorts compared to the Genome Aggregation Database (gnomAD), but there was no evidence for association with AD. Pathogenic GRN carriers had significantly higher PHFtau tangle density adjusting for age, sex and APOE ε4 genotype. AD patients with rs5848 had higher frequencies of hippocampal sclerosis and TDP-43 deposits. Twenty-two rare, pathogenic GRN variants were observed in the family cohort. DISCUSSION: GRN mutations in clinical and neuropathological AD increase the burden of tau-related brain pathology but show no specific association with ß-amyloid load or AD.


Assuntos
Doença de Alzheimer , Degeneração Lobar Frontotemporal , Humanos , Progranulinas/genética , Doença de Alzheimer/genética , Doença de Alzheimer/patologia , Peptídeos e Proteínas de Sinalização Intercelular/genética , Mutação/genética , Degeneração Lobar Frontotemporal/genética , Proteínas de Ligação a DNA/genética
9.
Bioinformatics ; 36(12): 3879-3881, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32330239

RESUMO

SUMMARY: We report Spark-based INFERence of the molecular mechanisms of NOn-coding genetic variants (SparkINFERNO), a scalable bioinformatics pipeline characterizing non-coding genome-wide association study (GWAS) association findings. SparkINFERNO prioritizes causal variants underlying GWAS association signals and reports relevant regulatory elements, tissue contexts and plausible target genes they affect. To achieve this, the SparkINFERNO algorithm integrates GWAS summary statistics with large-scale collection of functional genomics datasets spanning enhancer activity, transcription factor binding, expression quantitative trait loci and other functional datasets across more than 400 tissues and cell types. Scalability is achieved by an underlying API implemented using Apache Spark and Giggle-based genomic indexing. We evaluated SparkINFERNO on large GWASs and show that SparkINFERNO is more than 60 times efficient and scales with data size and amount of computational resources. AVAILABILITY AND IMPLEMENTATION: SparkINFERNO runs on clusters or a single server with Apache Spark environment, and is available at https://bitbucket.org/wanglab-upenn/SparkINFERNO or https://hub.docker.com/r/wanglab/spark-inferno. CONTACT: lswang@pennmedicine.upenn.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Locos de Características Quantitativas , Algoritmos , Genômica , Software
10.
Bioinformatics ; 35(6): 1033-1039, 2019 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-30668832

RESUMO

MOTIVATION: Small non-coding RNAs (sncRNAs, <100 nts) are highly abundant RNAs that regulate diverse and often tissue-specific cellular processes by associating with transcription factor complexes or binding to mRNAs. While thousands of sncRNA genes exist in the human genome, no single resource provides searchable, unified annotation, expression and processing information for full sncRNA transcripts and mature RNA products derived from these larger RNAs. RESULTS: Our goal is to establish a complete catalog of annotation, expression, processing, conservation, tissue-specificity and other biological features for all human sncRNA genes and mature products derived from all major RNA classes. DASHR (Database of small human non-coding RNAs) v2.0 database is the first that integrates human sncRNA gene and mature products profiles obtained from multiple RNA-seq protocols. Altogether, 185 tissues/cell types and sncRNA annotations and >800 curated experiments from ENCODE and GEO/SRA across multiple RNA-seq protocols for both GRCh38/hg38 and GRCh37/hg19 assemblies are integrated in DASHR. Moreover, DASHR is the first to contain both known and novel, previously un-annotated sncRNA loci identified by unsupervised segmentation (13 times more loci with 1 678 800 total). Additionally, DASHR v2.0 adds >3 200 000 annotations for non-small RNA genes and other genomic features (long-noncoding RNAs, mRNAs, promoters, repeats). Furthermore, DASHR v2.0 introduces an enhanced user interface, interactive experiment-by-locus table view, sncRNA locus sorting and filtering by biological features. All annotation and expression information directly downloadable and accessible as UCSC genome browser tracks. AVAILABILITY AND IMPLEMENTATION: DASHR v2.0 is freely available at https://lisanwanglab.org/DASHRv2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Pequeno RNA não Traduzido/provisão & distribuição , Bases de Dados de Ácidos Nucleicos , Genômica , Humanos , RNA Longo não Codificante , Análise de Sequência de RNA , Software
11.
Bioinformatics ; 35(10): 1768-1770, 2019 05 15.
Artigo em Inglês | MEDLINE | ID: mdl-30351394

RESUMO

SUMMARY: We report VCPA, our SNP/Indel Variant Calling Pipeline and data management tool used for the analysis of whole genome and exome sequencing (WGS/WES) for the Alzheimer's Disease Sequencing Project. VCPA consists of two independent but linkable components: pipeline and tracking database. The pipeline, implemented using the Workflow Description Language and fully optimized for the Amazon elastic compute cloud environment, includes steps from aligning raw sequence reads to variant calling using GATK. The tracking database allows users to view job running status in real time and visualize >100 quality metrics per genome. VCPA is functionally equivalent to the CCDG/TOPMed pipeline. Users can use the pipeline and the dockerized database to process large WGS/WES datasets on Amazon cloud with minimal configuration. AVAILABILITY AND IMPLEMENTATION: VCPA is released under the MIT license and is available for academic and nonprofit use for free. The pipeline source code and step-by-step instructions are available from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (http://www.niagads.org/VCPA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Doença de Alzheimer , Gerenciamento de Dados , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Software
12.
Nucleic Acids Res ; 46(W1): W36-W42, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29733404

RESUMO

The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.


Assuntos
Biologia Computacional/tendências , Pequeno RNA não Traduzido/genética , RNA/genética , Software , Animais , Genômica , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , Análise de Sequência de RNA/instrumentação , Transcriptoma/genética
13.
Alzheimers Dement ; 16(9): 1234-1247, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32715599

RESUMO

INTRODUCTION: Altered lipid metabolism is implicated in Alzheimer's disease (AD), but the mechanisms remain obscure. Aging-related declines in circulating plasmalogens containing omega-3 fatty acids may increase AD risk by reducing plasmalogen availability. METHODS: We measured four ethanolamine plasmalogens (PlsEtns) and four closely related phosphatidylethanolamines (PtdEtns) from the Alzheimer's Disease Neuroimaging Initiative (ADNI; n = 1547 serum) and University of Pennsylvania (UPenn; n = 112 plasma) cohorts, and derived indices reflecting PlsEtn and PtdEtn metabolism: PL-PX (PlsEtns), PL/PE (PlsEtn/PtdEtn ratios), and PBV (plasmalogen biosynthesis value; a composite index). We tested associations with baseline diagnosis, cognition, and cerebrospinal fluid (CSF) AD biomarkers. RESULTS: Results revealed statistically significant negative relationships in ADNI between AD versus CN with PL-PX (P = 0.007) and PBV (P = 0.005), late mild cognitive impairment (LMCI) versus cognitively normal (CN) with PL-PX (P = 2.89 × 10-5 ) and PBV (P = 1.99 × 10-4 ), and AD versus LMCI with PL/PE (P = 1.85 × 10-4 ). In the UPenn cohort, AD versus CN diagnosis associated negatively with PL/PE (P = 0.0191) and PBV (P = 0.0296). In ADNI, cognition was negatively associated with plasmalogen indices, including Alzheimer's Disease Assessment Scale 13-item cognitive subscale (ADAS-Cog13; PL-PX: P = 3.24 × 10-6 ; PBV: P = 6.92 × 10-5 ) and Mini-Mental State Examination (MMSE; PL-PX: P = 1.28 × 10-9 ; PBV: P = 6.50 × 10-9 ). In the UPenn cohort, there was a trend toward a similar relationship of MMSE with PL/PE (P = 0.0949). In ADNI, CSF total-tau was negatively associated with PL-PX (P = 5.55 × 10-6 ) and PBV (P = 7.77 × 10-6 ). Additionally, CSF t-tau/Aß1-42 ratio was negatively associated with these same indices (PL-PX, P = 2.73 × 10-6 ; PBV, P = 4.39 × 10-6 ). In the UPenn cohort, PL/PE was negatively associated with CSF total-tau (P = 0.031) and t-tau/Aß1-42 (P = 0.021). CSF Aß1-42 was not significantly associated with any of these indices in either cohort. DISCUSSION: These data extend previous studies by showing an association of decreased plasmalogen indices with AD, mild cognitive impairment (MCI), cognition, and CSF tau. Future studies are needed to better define mechanistic relationships, and to test the effects of interventions designed to replete serum plasmalogens.


Assuntos
Doença de Alzheimer , Testes Neuropsicológicos/estatística & dados numéricos , Plasmalogênios/sangue , Proteínas tau/líquido cefalorraquidiano , Idoso , Doença de Alzheimer/sangue , Doença de Alzheimer/diagnóstico , Biomarcadores/líquido cefalorraquidiano , Disfunção Cognitiva/líquido cefalorraquidiano , Estudos de Coortes , Feminino , Humanos , Masculino , Neuroimagem
14.
Bioinformatics ; 34(16): 2724-2731, 2018 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-29590295

RESUMO

Motivation: Annotation of genomic variants is an increasingly important and complex part of the analysis of sequence-based genomic analyses. Computational predictions of variant function are routinely incorporated into gene-based analyses of rare-variants, though to date most studies use limited information for assessing variant function that is often agnostic of the disease being studied. Results: In this work, we outline an annotation process motivated by the Alzheimer's Disease Sequencing Project, illustrate the impact of including tissue-specific transcript sets and sources of gene regulatory information and assess the potential impact of changing genomic builds on the annotation process. While these factors only impact a small proportion of total variant annotations (∼5%), they influence the potential analysis of a large fraction of genes (∼25%). Availability and implementation: Individual variant annotations are available via the NIAGADS GenomicsDB, at https://www.niagads.org/genomics/ tools-and-software/databases/genomics-database. Annotations are also available for bulk download at https://www.niagads.org/datasets. Annotation processing software is available at http://www.icompbio.net/resources/software-and-downloads/. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Doença de Alzheimer/genética , Predisposição Genética para Doença , Anotação de Sequência Molecular/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Software , Bases de Dados Genéticas , Genoma , Genômica , Humanos
15.
Nucleic Acids Res ; 44(D1): D216-22, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26553799

RESUMO

Small non-coding RNAs (sncRNAs) are highly abundant RNAs, typically <100 nucleotides long, that act as key regulators of diverse cellular processes. Although thousands of sncRNA genes are known to exist in the human genome, no single database provides searchable, unified annotation, and expression information for full sncRNA transcripts and mature RNA products derived from these larger RNAs. Here, we present the Database of small human noncoding RNAs (DASHR). DASHR contains the most comprehensive information to date on human sncRNA genes and mature sncRNA products. DASHR provides a simple user interface for researchers to view sequence and secondary structure, compare expression levels, and evidence of specific processing across all sncRNA genes and mature sncRNA products in various human tissues. DASHR annotation and expression data covers all major classes of sncRNAs including microRNAs (miRNAs), Piwi-interacting (piRNAs), small nuclear, nucleolar, cytoplasmic (sn-, sno-, scRNAs, respectively), transfer (tRNAs), and ribosomal RNAs (rRNAs). Currently, DASHR (v1.0) integrates 187 smRNA high-throughput sequencing (smRNA-seq) datasets with over 2.5 billion reads and annotation data from multiple public sources. DASHR contains annotations for ∼ 48,000 human sncRNA genes and mature sncRNA products, 82% of which are expressed in one or more of the curated tissues. DASHR is available at http://lisanwanglab.org/DASHR.


Assuntos
Bases de Dados de Ácidos Nucleicos , Pequeno RNA não Traduzido/metabolismo , Humanos , Anotação de Sequência Molecular , Processamento Pós-Transcricional do RNA , Pequeno RNA não Traduzido/química , Pequeno RNA não Traduzido/genética
16.
RNA ; 19(12): 1684-92, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24149843

RESUMO

RNA is often altered post-transcriptionally by the covalent modification of particular nucleotides; these modifications are known to modulate the structure and activity of their host RNAs. The recent discovery that an RNA methyl-6 adenosine demethylase (FTO) is a risk gene in obesity has brought to light the significance of RNA modifications to human biology. These noncanonical nucleotides, when converted to cDNA in the course of RNA sequencing, can produce sequence patterns that are distinguishable from simple base-calling errors. To determine whether these modifications can be detected in RNA sequencing data, we developed a method that can not only locate these modifications transcriptome-wide with single nucleotide resolution, but can also differentiate between different classes of modifications. Using small RNA-seq data we were able to detect 92% of all known human tRNA modification sites that are predicted to affect RT activity. We also found that different modifications produce distinct patterns of cDNA sequence, allowing us to differentiate between two classes of adenosine and two classes of guanine modifications with 98% and 79% accuracy, respectively. To show the robustness of this method to sample preparation and sequencing methods, as well as to organismal diversity, we applied it to a publicly available yeast data set and achieved similar levels of accuracy. We also experimentally validated two novel and one known 3-methylcytosine (3mC) sites predicted by HAMR in human tRNAs. Researchers can now use our method to identify and characterize RNA modifications using only RNA-seq data, both retrospectively and when asking questions specifically about modified RNA.


Assuntos
Anotação de Sequência Molecular/métodos , Processamento Pós-Transcricional do RNA , RNA de Transferência/genética , Software , Feminino , Células HEK293 , Humanos , Masculino , RNA/genética , RNA/metabolismo , RNA de Transferência/metabolismo , Saccharomyces cerevisiae/genética , Alinhamento de Sequência , Análise de Sequência de RNA
18.
Methods ; 67(1): 28-35, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24145223

RESUMO

Recent advances in high-throughput sequencing allow researchers to examine the transcriptome in more detail than ever before. Using a method known as high-throughput small RNA-sequencing, we can now profile the expression of small regulatory RNAs such as microRNAs and small interfering RNAs (siRNAs) with a great deal of sensitivity. However, there are many other types of small RNAs (<50nt) present in the cell, including fragments derived from snoRNAs (small nucleolar RNAs), snRNAs (small nuclear RNAs), scRNAs (small cytoplasmic RNAs), tRNAs (transfer RNAs), and transposon-derived RNAs. Here, we present a user's guide for CoRAL (Classification of RNAs by Analysis of Length), a computational method for discriminating between different classes of RNA using high-throughput small RNA-sequencing data. Not only can CoRAL distinguish between RNA classes with high accuracy, but it also uses features that are relevant to small RNA biogenesis pathways. By doing so, CoRAL can give biologists a glimpse into the characteristics of different RNA processing pathways and how these might differ between tissue types, biological conditions, or even different species. CoRAL is available at http://wanglab.pcbi.upenn.edu/coral/.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Pequeno RNA não Traduzido/classificação , Análise de Sequência de RNA , Algoritmos , Animais , Inteligência Artificial , Sequência de Bases , Árvores de Decisões , Entropia , Humanos , Sequências Repetidas Invertidas , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Processamento Pós-Transcricional do RNA , Pequeno RNA não Traduzido/genética
19.
Nucleic Acids Res ; 41(14): e137, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23700308

RESUMO

The surprising observation that virtually the entire human genome is transcribed means we know little about the function of many emerging classes of RNAs, except their astounding diversities. Traditional RNA function prediction methods rely on sequence or alignment information, which are limited in their abilities to classify the various collections of non-coding RNAs (ncRNAs). To address this, we developed Classification of RNAs by Analysis of Length (CoRAL), a machine learning-based approach for classification of RNA molecules. CoRAL uses biologically interpretable features including fragment length and cleavage specificity to distinguish between different ncRNA populations. We evaluated CoRAL using genome-wide small RNA sequencing data sets from four human tissue types and were able to classify six different types of RNAs with ∼80% cross-validation accuracy. Analysis by CoRAL revealed that microRNAs, small nucleolar and transposon-derived RNAs are highly discernible and consistent across all human tissue types assessed, whereas long intergenic ncRNAs, small cytoplasmic RNAs and small nuclear RNAs show less consistent patterns. The ability to reliably annotate loci across tissue types demonstrates the potential of CoRAL to characterize ncRNAs using small RNA sequencing data in less well-characterized organisms.


Assuntos
Inteligência Artificial , Pequeno RNA não Traduzido/classificação , Análise de Sequência de RNA , Algoritmos , Classificação/métodos , Humanos , Pequeno RNA não Traduzido/química
20.
medRxiv ; 2024 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-38293024

RESUMO

The prevalence of dementia among South Asians across India is approximately 7.4% in those 60 years and older, yet little is known about genetic risk factors for dementia in this population. Most known risk loci for Alzheimer's disease (AD) have been identified from studies conducted in European Ancestry (EA) but are unknown in South Asians. Using whole-genome sequence data from 2680 participants from the Diagnostic Assessment of Dementia for the Longitudinal Aging Study of India (LASI-DAD), we performed a gene-based analysis of 84 genes previously associated with AD in EA. We investigated associations with the Hindi Mental State Examination (HMSE) score and factor scores for general cognitive function and five cognitive domains. For each gene, we examined missense/loss-of-function (LoF) variants and brain-specific promoter/enhancer variants, separately, both with and without incorporating additional annotation weights (e.g., deleteriousness, conservation scores) using the variant-Set Test for Association using Annotation infoRmation (STAAR). In the missense/LoF analysis without annotation weights and controlling for age, sex, state/territory, and genetic ancestry, three genes had an association with at least one measure of cognitive function (FDR q<0.1). APOE was associated with four measures of cognitive function, PICALM was associated with HMSE score, and TSPOAP1 was associated with executive function. The most strongly associated variants in each gene were rs429358 (APOE ε4), rs779406084 (PICALM), and rs9913145 (TSPOAP1). rs779406084 is a rare missense mutation that is more prevalent in LASI-DAD than in EA (minor allele frequency=0.075% vs. 0.0015%); the other two are common variants. No genes in the brain-specific promoter/enhancer analysis met criteria for significance. Results with and without annotation weights were similar. Missense/LoF variants in some genes previously associated with AD in EA are associated with measures of cognitive function in South Asians from India. Analyzing genome sequence data allows identification of potential novel causal variants enriched in South Asians.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA