Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Comput Struct Biotechnol J ; 23: 679-687, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38292477

RESUMEN

Gene transcription is an essential process involved in all aspects of cellular functions with significant impact on biological traits and diseases. This process is tightly regulated by multiple elements that co-operate to jointly modulate the transcription levels of target genes. To decipher the complicated regulatory network, we present a novel multi-view attention-based deep neural network that models the relationship between genetic, epigenetic, and transcriptional patterns and identifies co-operative regulatory elements (COREs). We applied this new method, named DeepCORE, to predict transcriptomes in various tissues and cell lines, which outperformed the state-of-the-art algorithms. Furthermore, DeepCORE contains an interpreter that extracts the attention values embedded in the deep neural network, maps the attended regions to putative regulatory elements, and infers COREs based on correlated attentions. The identified COREs are significantly enriched with known promoters and enhancers. Novel regulatory elements discovered by DeepCORE showed epigenetic signatures consistent with the status of histone modification marks.

2.
Genome Med ; 15(1): 88, 2023 10 31.
Artículo en Inglés | MEDLINE | ID: mdl-37904203

RESUMEN

BACKGROUND: Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models. METHOD: To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype-phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes. RESULTS: We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer's disease). CONCLUSION: We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.


Asunto(s)
Enfermedad de Alzheimer , Aprendizaje Automático , Animales , Ratones , Redes Neurales de la Computación , Genotipo , Fenotipo , Enfermedad de Alzheimer/genética
3.
bioRxiv ; 2023 Apr 19.
Artículo en Inglés | MEDLINE | ID: mdl-37131697

RESUMEN

Gene transcription is an essential process involved in all aspects of cellular functions with significant impact on biological traits and diseases. This process is tightly regulated by multiple elements that co-operate to jointly modulate the transcription levels of target genes. To decipher the complicated regulatory network, we present a novel multi-view attention-based deep neural network that models the relationship between genetic, epigenetic, and transcriptional patterns and identifies co-operative regulatory elements (COREs). We applied this new method, named DeepCORE, to predict transcriptomes in 25 different cell lines, which outperformed the state-of-the-art algorithms. Furthermore, DeepCORE translates the attention values embedded in the neural network into interpretable information, including locations of putative regulatory elements and their correlations, which collectively implies COREs. These COREs are significantly enriched with known promoters and enhancers. Novel regulatory elements discovered by DeepCORE showed epigenetic signatures consistent with the status of histone modification marks.

4.
Mol Biol Evol ; 39(7)2022 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-35749590

RESUMEN

Understanding intratumor heterogeneity is critical for studying tumorigenesis and designing personalized treatments. To decompose the mixed cell population in a tumor, subclones are inferred computationally based on variant allele frequency (VAF) from bulk sequencing data. In this study, we showed that sequencing depth, mean VAF, and variance of VAF of a subclone are confounded. Without considering this effect, current methods require deep-sequencing data (>300× depth) to reliably infer subclones. Here, we present a novel algorithm that incorporates depth-variance and mean-variance dependencies in a clustering error model and successfully identifies subclones in tumors sequenced at depths of as low as 30×. We implemented the algorithm as a model-based adaptive grouping of subclones (MAGOS) method. Analyses of computer simulated data and empirical sequencing data showed that MAGOS outperformed existing methods on minimum sequencing depth, decomposition accuracy, and computation efficiency. The most prominent improvements were observed in analyzing tumors sequenced at depths between 30× and 200×, whereas the performance was comparable between MAGOS and existing methods on deeply sequenced tumors. MAGOS supports analysis of single-nucleotide variants and copy number variants from a single sample or multiple samples of a tumor. We applied MAGOS to whole-exome data of late-stage liver cancers and discovered that high subclone count in a tumor was a significant risk factor of poor prognosis. Lastly, our analysis suggested that sequencing multiple samples of the same tumor at standard depth is more cost-effective and robust for subclone characterization than deep sequencing a single sample. MAGOS is available at github (https://github.com/liliulab/magos).


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Neoplasias , Variaciones en el Número de Copia de ADN , Exoma , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Neoplasias/genética , Polimorfismo de Nucleótido Simple
5.
J Neurodev Disord ; 14(1): 28, 2022 05 02.
Artículo en Inglés | MEDLINE | ID: mdl-35501679

RESUMEN

Intellectual and Developmental Disabilities (IDDs), such as Down syndrome, Fragile X syndrome, Rett syndrome, and autism spectrum disorder, usually manifest at birth or early childhood. IDDs are characterized by significant impairment in intellectual and adaptive functioning, and both genetic and environmental factors underpin IDD biology. Molecular and genetic stratification of IDDs remain challenging mainly due to overlapping factors and comorbidity. Advances in high throughput sequencing, imaging, and tools to record behavioral data at scale have greatly enhanced our understanding of the molecular, cellular, structural, and environmental basis of some IDDs. Fueled by the "big data" revolution, artificial intelligence (AI) and machine learning (ML) technologies have brought a whole new paradigm shift in computational biology. Evidently, the ML-driven approach to clinical diagnoses has the potential to augment classical methods that use symptoms and external observations, hoping to push the personalized treatment plan forward. Therefore, integrative analyses and applications of ML technology have a direct bearing on discoveries in IDDs. The application of ML to IDDs can potentially improve screening and early diagnosis, advance our understanding of the complexity of comorbidity, and accelerate the identification of biomarkers for clinical research and drug development. For more than five decades, the IDDRC network has supported a nexus of investigators at centers across the USA, all striving to understand the interplay between various factors underlying IDDs. In this review, we introduced fast-increasing multi-modal data types, highlighted example studies that employed ML technologies to illuminate factors and biological mechanisms underlying IDDs, as well as recent advances in ML technologies and their applications to IDDs and other neurological diseases. We discussed various molecular, clinical, and environmental data collection modes, including genetic, imaging, phenotypical, and behavioral data types, along with multiple repositories that store and share such data. Furthermore, we outlined some fundamental concepts of machine learning algorithms and presented our opinion on specific gaps that will need to be filled to accomplish, for example, reliable implementation of ML-based diagnosis technology in IDD clinics. We anticipate that this review will guide researchers to formulate AI and ML-based approaches to investigate IDDs and related conditions.


Asunto(s)
Trastorno del Espectro Autista , Discapacidad Intelectual , Inteligencia Artificial , Trastorno del Espectro Autista/diagnóstico , Niño , Preescolar , Discapacidades del Desarrollo/diagnóstico , Humanos , Recién Nacido , Discapacidad Intelectual/diagnóstico , Aprendizaje Automático
6.
Bioinformatics ; 37(8): 1125-1134, 2021 05 23.
Artículo en Inglés | MEDLINE | ID: mdl-33135051

RESUMEN

MOTIVATION: Expression quantitative trait loci (eQTL) harbor genetic variants modulating gene transcription. Fine mapping of regulatory variants at these loci is a daunting task due to the juxtaposition of causal and linked variants at a locus as well as the likelihood of interactions among multiple variants. This problem is exacerbated in genes with multiple cis-acting eQTL, where superimposed effects of adjacent loci further distort the association signals. RESULTS: We developed a novel algorithm, TreeMap, that identifies putative causal variants in cis-eQTL accounting for multisite effects and genetic linkage at a locus. Guided by the hierarchical structure of linkage disequilibrium, TreeMap performs an organized search for individual and multiple causal variants. Via extensive simulations, we show that TreeMap detects co-regulating variants more accurately than current methods. Furthermore, its high computational efficiency enables genome-wide analysis of long-range eQTL. We applied TreeMap to GTEx data of brain hippocampus samples and transverse colon samples to search for eQTL in gene bodies and in 4 Mbps gene-flanking regions, discovering numerous distal eQTL. Furthermore, we found concordant distal eQTL that were present in both brain and colon samples, implying long-range regulation of gene expression. AVAILABILITY AND IMPLEMENTATION: TreeMap is available as an R package enabled for parallel processing at https://github.com/liliulab/treemap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Mapeo Cromosómico , Colon , Expresión Génica , Hipocampo , Humanos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo/genética
7.
Bioinformatics ; 36(6): 1712-1717, 2020 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-32176769

RESUMEN

MOTIVATION: Functions of cancer driver genes vary substantially across tissues and organs. Distinguishing passenger genes, oncogenes (OGs) and tumor-suppressor genes (TSGs) for each cancer type is critical for understanding tumor biology and identifying clinically actionable targets. Although many computational tools are available to predict putative cancer driver genes, resources for context-aware classifications of OGs and TSGs are limited. RESULTS: We show that the direction and magnitude of somatic selection of protein-coding mutations are significantly different for passenger genes, OGs and TSGs. Based on these patterns, we develop a new method (genes under selection in tumors) to discover OGs and TSGs in a cancer-type specific manner. Genes under selection in tumors shows a high accuracy (92%) when evaluated via strict cross-validations. Its application to 10 172 tumor exomes found known and novel cancer drivers with high tissue-specificities. In 11 out of 13 OGs shared among multiple cancer types, we found functional domains selectively engaged in different cancers, suggesting differences in disease mechanisms. AVAILABILITY AND IMPLEMENTATION: An R implementation of the GUST algorithm is available at https://github.com/liliulab/gust. A database with pre-computed results is available at https://liliulab.shinyapps.io/gust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genes Supresores de Tumor , Neoplasias/genética , Algoritmos , Humanos , Mutación , Oncogenes
8.
AMIA Jt Summits Transl Sci Proc ; 2019: 495-504, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31259004

RESUMEN

The clinical competency of residents at teaching hospitals is always under scrutiny. Ideally, assessment should reflect competency on-the-job, under realistic circumstances, and include evaluating their medical reports. Currently, the assessment is done manually by the attending physicians, which adds to the cognitive load. In this study, we developed an automated system for assessing medical resident's pathology reports. Our system used natural language processing (NLP) techniques to identify different lexical and semantic similarity scores at sentence level as well as chunk level. We then used supervised learning to classify the reports into three categories- No Change (NC), Minor Changes (MiC), and major changes (MaC), reflecting how much the attending physician's report differs from that of the resident. Our system was able to classify the reports with an accuracy of 73.6%. Although moderately successful, our work shows the potential and future of automated assessment systems in the biomedical domain.

9.
Nat Commun ; 10(1): 330, 2019 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-30659175

RESUMEN

Computational prediction of the phenotypic propensities of noncoding single nucleotide variants typically combines annotation of genomic, functional and evolutionary attributes into a single score. Here, we evaluate if the claimed excellent accuracies of these predictions translate into high rates of success in addressing questions important in biological research, such as fine mapping causal variants, distinguishing pathogenic allele(s) at a given position, and prioritizing variants for genetic risk assessment. A significant disconnect is found to exist between the statistical modelling and biological performance of predictive approaches. We discuss fundamental reasons underlying these deficiencies and suggest that future improvements of computational predictions need to address confounding of allelic, positional and regional effects as well as imbalance of the proportion of true positive variants in candidate lists.


Asunto(s)
Enfermedad/genética , Modelos Estadísticos , ARN no Traducido/genética , Algoritmos , Animales , Biología Computacional , Evolución Molecular , Estudio de Asociación del Genoma Completo , Humanos , Aprendizaje Automático , Mamíferos/genética , Polimorfismo de Nucleótido Simple
10.
J Med Internet Res ; 19(10): e361, 2017 10 30.
Artículo en Inglés | MEDLINE | ID: mdl-29084707

RESUMEN

BACKGROUND: Pregnancy exposure registries are the primary sources of information about the safety of maternal usage of medications during pregnancy. Such registries enroll pregnant women in a voluntary fashion early on in pregnancy and follow them until the end of pregnancy or longer to systematically collect information regarding specific pregnancy outcomes. Although the model of pregnancy registries has distinct advantages over other study designs, they are faced with numerous challenges and limitations such as low enrollment rate, high cost, and selection bias. OBJECTIVE: The primary objectives of this study were to systematically assess whether social media (Twitter) can be used to discover cohorts of pregnant women and to develop and deploy a natural language processing and machine learning pipeline for the automatic collection of cohort information. In addition, we also attempted to ascertain, in a preliminary fashion, what types of longitudinal information may potentially be mined from the collected cohort information. METHODS: Our discovery of pregnant women relies on detecting pregnancy-indicating tweets (PITs), which are statements posted by pregnant women regarding their pregnancies. We used a set of 14 patterns to first detect potential PITs. We manually annotated a sample of 14,156 of the retrieved user posts to distinguish real PITs from false positives and trained a supervised classification system to detect real PITs. We optimized the classification system via cross validation, with features and settings targeted toward optimizing precision for the positive class. For users identified to be posting real PITs via automatic classification, our pipeline collected all their available past and future posts from which other information (eg, medication usage and fetal outcomes) may be mined. RESULTS: Our rule-based PIT detection approach retrieved over 200,000 posts over a period of 18 months. Manual annotation agreement for three annotators was very high at kappa (κ)=.79. On a blind test set, the implemented classifier obtained an overall F1 score of 0.84 (0.88 for the pregnancy class and 0.68 for the nonpregnancy class). Precision for the pregnancy class was 0.93, and recall was 0.84. Feature analysis showed that the combination of dense and sparse vectors for classification achieved optimal performance. Employing the trained classifier resulted in the identification of 71,954 users from the collected posts. Over 250 million posts were retrieved for these users, which provided a multitude of longitudinal information about them. CONCLUSIONS: Social media sources such as Twitter can be used to identify large cohorts of pregnant women and to gather longitudinal information via automated processing of their postings. Considering the many drawbacks and limitations of pregnancy registries, social media mining may provide beneficial complementary information. Although the cohort sizes identified over social media are large, future research will have to assess the completeness of the information available through them.


Asunto(s)
Vigilancia de la Población/métodos , Medios de Comunicación Sociales/estadística & datos numéricos , Estudios de Cohortes , Femenino , Humanos , Embarazo
11.
AMIA Annu Symp Proc ; 2017: 1607-1616, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29854231

RESUMEN

An integral element of value-based care is care team access to both physical and behavioral health data. Data release processes in both environments are governed by federal and state statutes. The requirements for obtaining consent are complex and often confusing. Little is known about the consent processes and practices in the behavioral health setting, specifically how patients and surrogates engage in the process and their interactions with electronic consent tools. This study analyzes the consent processes from the patient perspective at two community behavioral health clinics. Outcomes include description of the processes using electronic consent, workflows and consenter-provider interactions. Conclusions include need to streamline and standardize consent technologies and improve consenter engagement. This study supports the development of an electronic consent tool, My Data Choices (MDC), funded by the National Institute of Mental Health, that offers individuals with behavioral health conditions more control over their medical records.


Asunto(s)
Servicios Comunitarios de Salud Mental/organización & administración , Consentimiento Informado , Acceso de los Pacientes a los Registros , Interoperabilidad de la Información en Salud , Alfabetización en Salud , Humanos , Grupo de Atención al Paciente , Estados Unidos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA