Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
PLoS Comput Biol ; 19(3): e1010984, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36972227

RESUMO

Those building predictive models from transcriptomic data are faced with two conflicting perspectives. The first, based on the inherent high dimensionality of biological systems, supposes that complex non-linear models such as neural networks will better match complex biological systems. The second, imagining that complex systems will still be well predicted by simple dividing lines prefers linear models that are easier to interpret. We compare multi-layer neural networks and logistic regression across multiple prediction tasks on GTEx and Recount3 datasets and find evidence in favor of both possibilities. We verified the presence of non-linear signal when predicting tissue and metadata sex labels from expression data by removing the predictive linear signal with Limma, and showed the removal ablated the performance of linear methods but not non-linear ones. However, we also found that the presence of non-linear signal was not necessarily sufficient for neural networks to outperform logistic regression. Our results demonstrate that while multi-layer neural networks may be useful for making predictions from gene expression data, including a linear baseline model is critical because while biological systems are high-dimensional, effective dividing lines for predictive models may not be.


Assuntos
Expressão Gênica , Dinâmica não Linear , Perfilação da Expressão Gênica , Redes Neurais de Computação , Modelos Lineares
2.
Bioinformatics ; 38(22): 5129-5130, 2022 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-36193991

RESUMO

MOTIVATION: Domain adaptation allows for the development of predictive models even in cases with limited sample data. Weighted elastic net domain adaptation specifically leverages features of genomic data to maximize transferability but the method is too computationally demanding to apply to many genome-sized datasets. RESULTS: We developed wenda_gpu, which uses GPyTorch to train models on genomic data within hours on a single GPU-enabled machine. We show that wenda_gpu returns comparable results to the original wenda implementation, and that it can be used for improved prediction of cancer mutation status on small sample sizes than regular elastic net. AVAILABILITY AND IMPLEMENTATION: wenda_gpu is available on GitHub at https://github.com/greenelab/wenda_gpu/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , Software , Humanos , Genômica/métodos , Neoplasias/genética , Tamanho da Amostra
3.
Nat Methods ; 16(9): 843-852, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31471613

RESUMO

Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology.


Assuntos
Biologia Computacional/métodos , Doença/genética , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla , Modelos Biológicos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Algoritmos , Perfilação da Expressão Gênica , Humanos , Fenótipo , Mapas de Interação de Proteínas
4.
Bioinformatics ; 36(12): 3920-3921, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32271874

RESUMO

SUMMARY: We define a disease module as a partition of a molecular network whose components are jointly associated with one or several diseases or risk factors thereof. Identification of such modules, across different types of networks, has great potential for elucidating disease mechanisms and establishing new powerful biomarkers. To this end, we launched the 'Disease Module Identification (DMI) DREAM Challenge', a community effort to build and evaluate unsupervised molecular network modularization algorithms. Here, we present MONET, a toolbox providing easy and unified access to the three top-performing methods from the DMI DREAM Challenge for the bioinformatics community. AVAILABILITY AND IMPLEMENTATION: MONET is a command line tool for Linux, based on Docker and Singularity containers; the core algorithms were written in R, Python, Ada and C++. It is freely available for download at https://github.com/BergmannLab/MONET.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software
5.
Bioinform Adv ; 4(1): vbae004, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38282973

RESUMO

Motivation: Most models can be fit to data using various optimization approaches. While model choice is frequently reported in machine-learning-based research, optimizers are not often noted. We applied two different implementations of LASSO logistic regression implemented in Python's scikit-learn package, using two different optimization approaches (coordinate descent, implemented in the liblinear library, and stochastic gradient descent, or SGD), to predict mutation status and gene essentiality from gene expression across a variety of pan-cancer driver genes. For varying levels of regularization, we compared performance and model sparsity between optimizers. Results: After model selection and tuning, we found that liblinear and SGD tended to perform comparably. liblinear models required more extensive tuning of regularization strength, performing best for high model sparsities (more nonzero coefficients), but did not require selection of a learning rate parameter. SGD models required tuning of the learning rate to perform well, but generally performed more robustly across different model sparsities as regularization strength decreased. Given these tradeoffs, we believe that the choice of optimizers should be clearly reported as a part of the model selection and validation process, to allow readers and reviewers to better understand the context in which results have been generated. Availability and implementation: The code used to carry out the analyses in this study is available at https://github.com/greenelab/pancancer-evaluation/tree/master/01_stratified_classification. Performance/regularization strength curves for all genes in the Vogelstein et al. (2013) dataset are available at https://doi.org/10.6084/m9.figshare.22728644.

6.
Nat Commun ; 14(1): 3672, 2023 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-37339989

RESUMO

High-resolution imaging has revolutionized the study of single cells in their spatial context. However, summarizing the great diversity of complex cell shapes found in tissues and inferring associations with other single-cell data remains a challenge. Here, we present CAJAL, a general computational framework for the analysis and integration of single-cell morphological data. By building upon metric geometry, CAJAL infers cell morphology latent spaces where distances between points indicate the amount of physical deformation required to change the morphology of one cell into that of another. We show that cell morphology spaces facilitate the integration of single-cell morphological data across technologies and the inference of relations with other data, such as single-cell transcriptomic data. We demonstrate the utility of CAJAL with several morphological datasets of neurons and glia and identify genes associated with neuronal plasticity in C. elegans. Our approach provides an effective strategy for integrating cell morphology data into single-cell omics analyses.


Assuntos
Caenorhabditis elegans , Neurônios , Animais , Caenorhabditis elegans/genética , Perfilação da Expressão Gênica , Transcriptoma
7.
Genome Biol ; 23(1): 137, 2022 06 27.
Artigo em Inglês | MEDLINE | ID: mdl-35761387

RESUMO

BACKGROUND: In studies of cellular function in cancer, researchers are increasingly able to choose from many -omics assays as functional readouts. Choosing the correct readout for a given study can be difficult, and which layer of cellular function is most suitable to capture the relevant signal remains unclear. RESULTS: We consider prediction of cancer mutation status (presence or absence) from functional -omics data as a representative problem that presents an opportunity to quantify and compare the ability of different -omics readouts to capture signals of dysregulation in cancer. From the TCGA Pan-Cancer Atlas that contains genetic alteration data, we focus on RNA sequencing, DNA methylation arrays, reverse phase protein arrays (RPPA), microRNA, and somatic mutational signatures as -omics readouts. Across a collection of genes recurrently mutated in cancer, RNA sequencing tends to be the most effective predictor of mutation state. We find that one or more other data types for many of the genes are approximately equally effective predictors. Performance is more variable between mutations than that between data types for the same mutation, and there is little difference between the top data types. We also find that combining data types into a single multi-omics model provides little or no improvement in predictive ability over the best individual data type. CONCLUSIONS: Based on our results, for the design of studies focused on the functional outcomes of cancer mutations, there are often multiple -omics types that can serve as effective readouts, although gene expression seems to be a reasonable default option.


Assuntos
MicroRNAs , Neoplasias , Humanos , Mutação , Neoplasias/genética
8.
Genomics Proteomics Bioinformatics ; 20(5): 912-927, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36216026

RESUMO

Genome-wide transcriptome profiling identifies genes that are prone to differential expression (DE) across contexts, as well as genes with changes specific to the experimental manipulation. Distinguishing genes that are specifically changed in a context of interest from common differentially expressed genes (DEGs) allows more efficient prediction of which genes are specific to a given biological process under scrutiny. Currently, common DEGs or pathways can only be identified through the laborious manual curation of experiments, an inordinately time-consuming endeavor. Here we pioneer an approach, Specific cOntext Pattern Highlighting In Expression data (SOPHIE), for distinguishing between common and specific transcriptional patterns using a generative neural network to create a background set of experiments from which a null distribution of gene and pathway changes can be generated. We apply SOPHIE to diverse datasets including those from human, human cancer, and bacterial pathogen Pseudomonas aeruginosa. SOPHIE identifies common DEGs in concordance with previously described, manually and systematically determined common DEGs. Further molecular validation indicates that SOPHIE detects highly specific but low-magnitude biologically relevant transcriptional changes. SOPHIE's measure of specificity can complement log2 fold change values generated from traditional DE analyses. For example, by filtering the set of DEGs, one can identify genes that are specifically relevant to the experimental condition of interest. Consequently, these results can inform future research directions. All scripts used in these analyses are available at https://github.com/greenelab/generic-expression-patterns. Users can access https://github.com/greenelab/sophie to run SOPHIE on their own data.


Assuntos
Perfilação da Expressão Gênica , Transcriptoma , Humanos , Perfilação da Expressão Gênica/métodos , Redes Neurais de Computação , Redes Reguladoras de Genes
9.
Curr Opin Biotechnol ; 63: 126-134, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-31962244

RESUMO

In biomedical applications of machine learning, relevant information often has a rich structure that is not easily encoded as real-valued predictors. Examples of such data include DNA or RNA sequences, gene sets or pathways, gene interaction or coexpression networks, ontologies, and phylogenetic trees. We highlight recent examples of machine learning models that use structure to constrain model architecture or incorporate structured data into model training. For machine learning in biomedicine, where sample size is limited and model interpretability is crucial, incorporating prior knowledge in the form of structured data can be particularly useful. The area of research would benefit from performant open source implementations and independent benchmarking efforts.


Assuntos
Aprendizado de Máquina , Filogenia
10.
AJP Rep ; 10(2): e183-e186, 2020 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-32577321

RESUMO

Introduction Despite time standards for second stage labor, "delayed pushing," uterine contraction frequency, and alternate contraction pushing may alter the effective maternal effort. We sought to quantify the number of pushing contractions needed for a spontaneous vaginal delivery (SVD) among primipara and multipara patients. Methods Deliveries at Harbor-UCLA Medical Center in 2017 were selected for SVD of singleton, term newborns. The first 100 primipara and 100 multipara deliveries were analyzed and monitor tracings quantified for pushing contractions. Results Significantly more pushing contractions were required by primiparas versus multiparas (17.3 ± 1.7 vs. 5.5 ± 0.7; p < 0.001) in accord with a longer second stage (86.7 ± 7.8 vs. 27.2 ± 4.9 min; p < 0.001) and epidural was associated with greater number of pushing contractions among both primipara (18.5 ± 1.8 vs. 10.8 ± 0.8) and multipara women (6.1 ± 0.8 vs. 4.1 ± 0.3). Newborn weight (<3000, 3000-3500, >3500 g) demonstrated a trend for increased pushing contractions among primipara (16.9, 16.5, 19.8 pushes, respectively) though not multiparas. Conclusion Although correlated with the absolute duration of the second stage, the number of pushing contractions eliminates ambiguities of "delayed pushing," pushing every-other, and frequency of contractions. Examination of larger databases and patients with second stage "arrest disorders" may provide pushing contraction criteria predictive of SVD and prevention of morbidity.

11.
BMC Syst Biol ; 12(1): 113, 2018 11 19.
Artigo em Inglês | MEDLINE | ID: mdl-30453938

RESUMO

The authors have retracted this article [1]. After publication they discovered a technical error in the Louvain algorithm with bounded cluster sizes. Correction of this error substantially changed the results for this algorithm and the conclusions drawn in the article were found to be incorrect. The authors will submit a new manuscript for peer review.

12.
BMC Syst Biol ; 12(Suppl 3): 24, 2018 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-29589565

RESUMO

BACKGROUND: Decomposing a protein-protein interaction network (PPI network) into non-overlapping clusters or communities, sometimes called "network modules," is an important way to explore functional roles of sets of genes. When the method to accomplish this decomposition is solely based on purely graph-theoretic measures of the interconnection structure of the network, this is often called unsupervised clustering or community detection. In this study, we compare unsupervised computational methods for decomposing a PPI network into non-overlapping modules. A method is preferred if it results in a large proportion of nodes being assigned to functionally meaningful modules, as measured by functional enrichment over terms from the Gene Ontology (GO). RESULTS: We compare the performance of three popular community detection algorithms with the same algorithms run after the network is pre-processed by removing and reweighting based on the diffusion state distance (DSD) between pairs of nodes in the network. We call this "detangling" the network. In almost all cases, we find that detangling the network based on the DSD distance reweighting provides more meaningful clusters. CONCLUSIONS: Re-embedding using the DSD distance metric, before applying standard community detection algorithms, can assist in uncovering GO functionally enriched clusters in the yeast PPI network.

13.
Nat Biomed Eng ; 2(1): 38-47, 2018 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-29998038

RESUMO

The CRISPR-Cas9 system provides unprecedented genome editing capabilities. However, off-target effects lead to sub-optimal usage and additionally are a bottleneck in the development of therapeutic uses. Herein, we introduce the first machine learning-based approach to off-target prediction, yielding a state-of-the-art model for CRISPR-Cas9 that outperforms all other guide design services. Our approach, Elevation, consists of two interdependent machine learning models-one for scoring individual guide-target pairs, and another which aggregates these guide-target scores into a single, overall summary guide score. Through systematic investigation, we demonstrate that Elevation performs substantially better than competing approaches on both tasks. Additionally, we are the first to systematically evaluate approaches on the guide summary score problem; we show that the most widely-used method performs no better than random at times, whereas Elevation consistently outperformed it, sometimes by an order of magnitude. We also introduce an evaluation method that balances errors between active and inactive guides, thereby encapsulating a range of practical use cases; Elevation is consistently superior to other methods across the entire range. Finally, because of the large scale and computational demands of off-target prediction, we have developed a cloud-based service for quick retrieval. This service provides end-to-end guide design by also incorporating our previously reported on-target model, Azimuth. (https://crispr.ml:please treat this web site as confidential until publication).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA