Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
BMC Bioinformatics ; 22(1): 509, 2021 Oct 19.
Artículo en Inglés | MEDLINE | ID: mdl-34666677

RESUMEN

BACKGROUND: Sequencing partial 16S rRNA genes is a cost effective method for quantifying the microbial composition of an environment, such as the human gut. However, downstream analysis relies on binning reads into microbial groups by either considering each unique sequence as a different microbe, querying a database to get taxonomic labels from sequences, or clustering similar sequences together. However, these approaches do not fully capture evolutionary relationships between microbes, limiting the ability to identify differentially abundant groups of microbes between a diseased and control cohort. We present sequence-based biomarkers (SBBs), an aggregation method that groups and aggregates microbes using single variants and combinations of variants within their 16S sequences. We compare SBBs against other existing aggregation methods (OTU clustering and Microphenoor DiTaxa features) in several benchmarking tasks: biomarker discovery via permutation test, biomarker discovery via linear discriminant analysis, and phenotype prediction power. We demonstrate the SBBs perform on-par or better than the state-of-the-art methods in biomarker discovery and phenotype prediction. RESULTS: On two independent datasets, SBBs identify differentially abundant groups of microbes with similar or higher statistical significance than existing methods in both a permutation-test-based analysis and using linear discriminant analysis effect size. . By grouping microbes by SBB, we can identify several differentially abundant microbial groups (FDR <.1) between children with autism and neurotypical controls in a set of 115 discordant siblings. Porphyromonadaceae, Ruminococcaceae, and an unnamed species of Blastocystis were significantly enriched in autism, while Veillonellaceae was significantly depleted. Likewise, aggregating microbes by SBB on a dataset of obese and lean twins, we find several significantly differentially abundant microbial groups (FDR<.1). We observed Megasphaera andSutterellaceae highly enriched in obesity, and Phocaeicola significantly depleted. SBBs also perform on bar with or better than existing aggregation methods as features in a phenotype prediction model, predicting the autism phenotype with an ROC-AUC score of .64 and the obesity phenotype with an ROC-AUC score of .84. CONCLUSIONS: SBBs provide a powerful method for aggregating microbes to perform differential abundance analysis as well as phenotype prediction. Our source code can be freely downloaded from http://github.com/briannachrisman/16s_biomarkers .


Asunto(s)
Microbioma Gastrointestinal , Biomarcadores , Análisis por Conglomerados , Microbioma Gastrointestinal/genética , Humanos , ARN Ribosómico 16S/genética , Programas Informáticos
2.
BMC Bioinformatics ; 21(1): 356, 2020 Aug 12.
Artículo en Inglés | MEDLINE | ID: mdl-32787845

RESUMEN

BACKGROUND: Complex human health conditions with etiological heterogeneity like Autism Spectrum Disorder (ASD) often pose a challenge for traditional genome-wide association study approaches in defining a clear genotype to phenotype model. Coalitional game theory (CGT) is an exciting method that can consider the combinatorial effect of groups of variants working in concert to produce a phenotype. CGT has been applied to associate likely-gene-disrupting variants encoded from whole genome sequence data to ASD; however, this previous approach cannot take into account for prior biological knowledge. Here we extend CGT to incorporate a priori knowledge from biological networks through a game theoretic centrality measure based on Shapley value to rank genes by their relevance-the individual gene's synergistic influence in a gene-to-gene interaction network. Game theoretic centrality extends the notion of Shapley value to the evaluation of a gene's contribution to the overall connectivity of its corresponding node in a biological network. RESULTS: We implemented and applied game theoretic centrality to rank genes on whole genomes from 756 multiplex autism families. Top ranking genes with the highest game theoretic centrality in both the weighted and unweighted approaches were enriched for pathways previously associated with autism, including pathways of the immune system. Four of the selected genes HLA-A, HLA-B, HLA-G, and HLA-DRB1-have also been implicated in ASD and further support the link between ASD and the human leukocyte antigen complex. CONCLUSIONS: Game theoretic centrality can prioritize influential, disease-associated genes within biological networks, and assist in the decoding of polygenic associations to complex disorders like autism.


Asunto(s)
Algoritmos , Teoría del Juego , Redes Reguladoras de Genes , Estudios de Asociación Genética , Trastorno del Espectro Autista/genética , Estudio de Asociación del Genoma Completo , Humanos , Mapeo de Interacción de Proteínas , Reproducibilidad de los Resultados
3.
Sci Rep ; 14(1): 13887, 2024 06 16.
Artículo en Inglés | MEDLINE | ID: mdl-38880810

RESUMEN

Dementia is a progressive neurological disorder that affects the daily lives of older adults, impacting their verbal communication and cognitive function. Early diagnosis is important to enhance the lifespan and quality of life for affected individuals. Despite its importance, diagnosing dementia is a complex process. Automated machine learning solutions involving multiple types of data have the potential to improve the process of automated dementia screening. In this study, we build deep learning models to classify dementia cases from controls using the Pitt Cookie Theft dataset from DementiaBank, a database of short participant responses to the structured task of describing a picture of a cookie theft. We fine-tune Wav2vec and Word2vec baseline models to make binary predictions of dementia from audio recordings and text transcripts, respectively. We conduct experiments with four versions of the dataset: (1) the original data, (2) the data with short sentences removed, (3) text-based augmentation of the original data, and (4) text-based augmentation of the data with short sentences removed. Our results indicate that synonym-based text data augmentation generally enhances the performance of models that incorporate the text modality. Without data augmentation, models using the text modality achieve around 60% accuracy and 70% AUROC scores, and with data augmentation, the models achieve around 80% accuracy and 90% AUROC scores. We do not observe significant improvements in performance with the addition of audio or timestamp information into the model. We include a qualitative error analysis of the sentences that are misclassified under each study condition. This study provides preliminary insights into the effects of both text-based data augmentation and multimodal deep learning for automated dementia classification.


Asunto(s)
Aprendizaje Profundo , Demencia , Humanos , Demencia/diagnóstico , Demencia/clasificación , Anciano , Femenino , Masculino , Anciano de 80 o más Años , Bases de Datos Factuales
4.
Res Sq ; 2023 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-37333241

RESUMEN

Evidence suggests that an increasing number of e-cigarette users report intentions and attempts to quit vaping. Since exposure to e-cigarette-related content on social media may influence e-cigarette and other tobacco product use, including potentially e-cigarette cessation, we aimed to explore vaping cessation-related posts on Twitter by utilizing a mixed-methods approach. We collected tweets pertaining to vaping cessation for the time period between January 2022 and December 2022 using snscrape. Tweets were scraped for the following hashtags: #vapingcessation, #quitvaping, and #stopJuuling. Data were analysed using Azure Machine Learning and Nvivo 12 software. Sentiment analysis revealed that vaping cessation-related tweets typically embody positive sentiment and are mostly produced in the U.S. and Australia. Our qualitative analysis identified six emerging themes: vaping cessation support, promotion of vaping cessation, barriers and benefits to vaping cessation, personal vaping cessation, and usefulness of peer support for vaping cessation. Our findings imply that improved dissemination of evidence-based vaping cessation strategies to a broad audience through Twitter may promote vaping cessation at the population level.

5.
Pac Symp Biocomput ; 28: 461-471, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36541000

RESUMEN

Innovations in human-centered biomedical informatics are often developed with the eventual goal of real-world translation. While biomedical research questions are usually answered in terms of how a method performs in a particular context, we argue that it is equally important to consider and formally evaluate the ethical implications of informatics solutions. Several new research paradigms have arisen as a result of the consideration of ethical issues, including but not limited for privacy-preserving computation and fair machine learning. In the spirit of the Pacific Symposium on Biocomputing, we discuss broad and fundamental principles of ethical biomedical informatics in terms of Olelo Noeau, or Hawaiian proverbs and poetical sayings that capture Hawaiian values. While we emphasize issues related to privacy and fairness in particular, there are a multitude of facets to ethical biomedical informatics that can benefit from a critical analysis grounded in ethics.


Asunto(s)
Biología Computacional , Informática , Humanos , Hawaii , Privacidad
6.
BioData Min ; 14(1): 28, 2021 May 03.
Artículo en Inglés | MEDLINE | ID: mdl-33941233

RESUMEN

BACKGROUND: Machine learning approaches for predicting disease risk from high-dimensional whole genome sequence (WGS) data often result in unstable models that can be difficult to interpret, limiting the identification of putative sets of biomarkers. Here, we design and validate a graph-based methodology based on maximum flow, which leverages the presence of linkage disequilibrium (LD) to identify stable sets of variants associated with complex multigenic disorders. RESULTS: We apply our method to a previously published logistic regression model trained to identify variants in simple repeat sequences associated with autism spectrum disorder (ASD); this L1-regularized model exhibits high predictive accuracy yet demonstrates great variability in the features selected from over 230,000 possible variants. In order to improve model stability, we extract the variants assigned non-zero weights in each of 5 cross-validation folds and then assemble the five sets of features into a flow network subject to LD constraints. The maximum flow formulation allowed us to identify 55 variants, which we show to be more stable than the features identified by the original classifier. CONCLUSION: Our method allows for the creation of machine learning models that can identify predictive variants. Our results help pave the way towards biomarker-based diagnosis methods for complex genetic disorders.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA