Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 74
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 40(Supplement_1): i453-i461, 2024 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-38940174

RESUMEN

MOTIVATION: Genetic perturbations (e.g. knockouts, variants) have laid the foundation for our understanding of many diseases, implicating pathogenic mechanisms and indicating therapeutic targets. However, experimental assays are fundamentally limited by the number of measurable perturbations. Computational methods can fill this gap by predicting perturbation effects under novel conditions, but accurately predicting the transcriptional responses of cells to unseen perturbations remains a significant challenge. RESULTS: We address this by developing a novel attention-based neural network, AttentionPert, which accurately predicts gene expression under multiplexed perturbations and generalizes to unseen conditions. AttentionPert integrates global and local effects in a multi-scale model, representing both the nonuniform system-wide impact of the genetic perturbation and the localized disturbance in a network of gene-gene similarities, enhancing its ability to predict nuanced transcriptional responses to both single and multi-gene perturbations. In comprehensive experiments, AttentionPert demonstrates superior performance across multiple datasets outperforming the state-of-the-art method in predicting differential gene expressions and revealing novel gene regulations. AttentionPert marks a significant improvement over current methods, particularly in handling the diversity of gene perturbations and in predicting out-of-distribution scenarios. AVAILABILITY AND IMPLEMENTATION: Code is available at https://github.com/BaiDing1234/AttentionPert.


Asunto(s)
Biología Computacional , Biología Computacional/métodos , Humanos , Redes Reguladoras de Genes , Redes Neurales de la Computación , Perfilación de la Expresión Génica/métodos
2.
BMC Bioinformatics ; 22(1): 50, 2021 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-33546598

RESUMEN

BACKGROUND: In the last decade, Genome-wide Association studies (GWASs) have contributed to decoding the human genome by uncovering many genetic variations associated with various diseases. Many follow-up investigations involve joint analysis of multiple independently generated GWAS data sets. While most of the computational approaches developed for joint analysis are based on summary statistics, the joint analysis based on individual-level data with consideration of confounding factors remains to be a challenge. RESULTS: In this study, we propose a method, called Coupled Mixed Model (CMM), that enables a joint GWAS analysis on two independently collected sets of GWAS data with different phenotypes. The CMM method does not require the data sets to have the same phenotypes as it aims to infer the unknown phenotypes using a set of multivariate sparse mixed models. Moreover, CMM addresses the confounding variables due to population stratification, family structures, and cryptic relatedness, as well as those arising during data collection such as batch effects that frequently appear in joint genetic studies. We evaluate the performance of CMM using simulation experiments. In real data analysis, we illustrate the utility of CMM by an application to evaluating common genetic associations for Alzheimer's disease and substance use disorder using datasets independently collected for the two complex human disorders. Comparison of the results with those from previous experiments and analyses supports the utility of our method and provides new insights into the diseases. The software is available at https://github.com/HaohanWang/CMM .


Asunto(s)
Estudio de Asociación del Genoma Completo , Fenotipo , Programas Informáticos , Algoritmos , Humanos , Modelos Genéticos , Polimorfismo de Nucleótido Simple
3.
PLoS Comput Biol ; 16(11): e1008297, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33151940

RESUMEN

In eukaryotes, polyadenylation (poly(A)) is an essential process during mRNA maturation. Identifying the cis-determinants of poly(A) signal (PAS) on the DNA sequence is the key to understand the mechanism of translation regulation and mRNA metabolism. Although machine learning methods were widely used in computationally identifying PAS, the need for tremendous amounts of annotation data hinder applications of existing methods in species without experimental data on PAS. Therefore, cross-species PAS identification, which enables the possibility to predict PAS from untrained species, naturally becomes a promising direction. In our works, we propose a novel deep learning method named Poly(A)-DG for cross-species PAS identification. Poly(A)-DG consists of a Convolution Neural Network-Multilayer Perceptron (CNN-MLP) network and a domain generalization technique. It learns PAS patterns from the training species and identifies PAS in target species without re-training. To test our method, we use four species and build cross-species training sets with two of them and evaluate the performance of the remaining ones. Moreover, we test our method against insufficient data and imbalanced data issues and demonstrate that Poly(A)-DG not only outperforms state-of-the-art methods but also maintains relatively high accuracy when it comes to a smaller or imbalanced training set.


Asunto(s)
Aprendizaje Profundo , Desoxiguanosina/metabolismo , Poli A/metabolismo , Transducción de Señal , Animales , Humanos , Redes Neurales de la Computación , Especificidad de la Especie
4.
Bioinformatics ; 35(7): 1181-1187, 2019 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30184048

RESUMEN

MOTIVATION: Association studies to discover links between genetic markers and phenotypes are central to bioinformatics. Methods of regularized regression, such as variants of the Lasso, are popular for this task. Despite the good predictive performance of these methods in the average case, they suffer from unstable selections of correlated variables and inconsistent selections of linearly dependent variables. Unfortunately, as we demonstrate empirically, such problematic situations of correlated and linearly dependent variables often exist in genomic datasets and lead to under-performance of classical methods of variable selection. RESULTS: To address these challenges, we propose the Precision Lasso. Precision Lasso is a Lasso variant that promotes sparse variable selection by regularization governed by the covariance and inverse covariance matrices of explanatory variables. We illustrate its capacity for stable and consistent variable selection in simulated data with highly correlated and linearly dependent variables. We then demonstrate the effectiveness of the Precision Lasso to select meaningful variables from transcriptomic profiles of breast cancer patients. Our results indicate that in settings with correlated and linearly dependent variables, the Precision Lasso outperforms popular methods of variable selection such as the Lasso, the Elastic Net and Minimax Concave Penalty (MCP) regression. AVAILABILITY AND IMPLEMENTATION: Software is available at https://github.com/HaohanWang/thePrecisionLasso. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genómica , Programas Informáticos , Humanos , Fenotipo
5.
BMC Bioinformatics ; 20(Suppl 23): 656, 2019 Dec 27.
Artículo en Inglés | MEDLINE | ID: mdl-31881907

RESUMEN

BACKGROUND: Genome-wide Association Studies (GWAS) have contributed to unraveling associations between genetic variants in the human genome and complex traits for more than a decade. While many works have been invented as follow-ups to detect interactions between SNPs, epistasis are still yet to be modeled and discovered more thoroughly. RESULTS: In this paper, following the previous study of detecting marginal epistasis signals, and motivated by the universal approximation power of deep learning, we propose a neural network method that can potentially model arbitrary interactions between SNPs in genetic association studies as an extension to the mixed models in correcting confounding factors. Our method, namely Deep Mixed Model, consists of two components: 1) a confounding factor correction component, which is a large-kernel convolution neural network that focuses on calibrating the residual phenotypes by removing factors such as population stratification, and 2) a fixed-effect estimation component, which mainly consists of an Long-short Term Memory (LSTM) model that estimates the association effect size of SNPs with the residual phenotype. CONCLUSIONS: After validating the performance of our method using simulation experiments, we further apply it to Alzheimer's disease data sets. Our results help gain some explorative understandings of the genetic architecture of Alzheimer's disease.


Asunto(s)
Epistasis Genética , Estudio de Asociación del Genoma Completo , Modelos Genéticos , Algoritmos , Enfermedad de Alzheimer/genética , Área Bajo la Curva , Secuencia de Bases , Simulación por Computador , Humanos , Polimorfismo de Nucleótido Simple/genética , Curva ROC
6.
Bioinformatics ; 34(13): i178-i186, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29949997

RESUMEN

Motivation: In many applications, inter-sample heterogeneity is crucial to understanding the complex biological processes under study. For example, in genomic analysis of cancers, each patient in a cohort may have a different driver mutation, making it difficult or impossible to identify causal mutations from an averaged view of the entire cohort. Unfortunately, many traditional methods for genomic analysis seek to estimate a single model which is shared by all samples in a population, ignoring this inter-sample heterogeneity entirely. In order to better understand patient heterogeneity, it is necessary to develop practical, personalized statistical models. Results: To uncover this inter-sample heterogeneity, we propose a novel regularizer for achieving patient-specific personalized estimation. This regularizer operates by learning two latent distance metrics-one between personalized parameters and one between clinical covariates-and attempting to match the induced distances as closely as possible. Crucially, we do not assume these distance metrics are already known. Instead, we allow the data to dictate the structure of these latent distance metrics. Finally, we apply our method to learn patient-specific, interpretable models for a pan-cancer gene expression dataset containing samples from more than 30 distinct cancer types and find strong evidence of personalization effects between cancer types as well as between individuals. Our analysis uncovers sample-specific aberrations that are overlooked by population-level methods, suggesting a promising new path for precision analysis of complex diseases such as cancer. Availability and implementation: Software for personalized linear and personalized logistic regression, along with code to reproduce experimental results, is freely available at github.com/blengerich/personalized_regression.


Asunto(s)
Genómica/métodos , Modelos Genéticos , Mutación , Neoplasias/genética , Programas Informáticos , Femenino , Predisposición Genética a la Enfermedad , Humanos , Masculino , Modelos Estadísticos , Polimorfismo de Nucleótido Simple , Medicina de Precisión/métodos , Análisis de Secuencia de ADN/métodos
7.
Methods ; 145: 2-9, 2018 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-29705212

RESUMEN

A fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of sample structure in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and human, and discuss the knowledge we discover with our method.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Modelos Estadísticos , Polimorfismo de Nucleótido Simple , Animales , Humanos , Plantas/genética
8.
Methods ; 145: 33-40, 2018 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-29705210

RESUMEN

Genome-wide Association Study has presented a promising way to understand the association between human genomes and complex traits. Many simple polymorphic loci have been shown to explain a significant fraction of phenotypic variability. However, challenges remain in the non-triviality of explaining complex traits associated with multifactorial genetic loci, especially considering the confounding factors caused by population structure, family structure, and cryptic relatedness. In this paper, we propose a Squared-LMM (LMM2) model, aiming to jointly correct population and genetic confounding factors. We offer two strategies of utilizing LMM2 for association mapping: 1) It serves as an extension of univariate LMM, which could effectively correct population structure, but consider each SNP in isolation. 2) It is integrated with the multivariate regression model to discover association relationship between complex traits and multifactorial genetic loci. We refer to this second model as sparse Squared-LMM (sLMM2). Further, we extend LMM2/sLMM2 by raising the power of our squared model to the LMMn/sLMMn model. We demonstrate the practical use of our model with synthetic phenotypic variants generated from genetic loci of Arabidopsis Thaliana. The experiment shows that our method achieves a more accurate and significant prediction on the association relationship between traits and loci. We also evaluate our models on collected phenotypes and genotypes with the number of candidate genes that the models could discover. The results suggest the potential and promising usage of our method in genome-wide association studies.


Asunto(s)
Sitios Genéticos , Estudio de Asociación del Genoma Completo/métodos , Modelos Estadísticos , Polimorfismo Genético , Arabidopsis/genética , Evolución Molecular , Genes de Plantas , Genética de Población , Modelos Genéticos , Familia de Multigenes
9.
Bioinformatics ; 33(14): i13-i22, 2017 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-28881965

RESUMEN

MOTIVATION: Cellular Electron CryoTomography (CECT) enables 3D visualization of cellular organization at near-native state and in sub-molecular resolution, making it a powerful tool for analyzing structures of macromolecular complexes and their spatial organizations inside single cells. However, high degree of structural complexity together with practical imaging limitations makes the systematic de novo discovery of structures within cells challenging. It would likely require averaging and classifying millions of subtomograms potentially containing hundreds of highly heterogeneous structural classes. Although it is no longer difficult to acquire CECT data containing such amount of subtomograms due to advances in data acquisition automation, existing computational approaches have very limited scalability or discrimination ability, making them incapable of processing such amount of data. RESULTS: To complement existing approaches, in this article we propose a new approach for subdividing subtomograms into smaller but relatively homogeneous subsets. The structures in these subsets can then be separately recovered using existing computation intensive methods. Our approach is based on supervised structural feature extraction using deep learning, in combination with unsupervised clustering and reference-free classification. Our experiments show that, compared with existing unsupervised rotation invariant feature and pose-normalization based approaches, our new approach achieves significant improvements in both discrimination ability and scalability. More importantly, our new approach is able to discover new structural classes and recover structures that do not exist in training data. AVAILABILITY AND IMPLEMENTATION: Source code freely available at http://www.cs.cmu.edu/∼mxu1/software . CONTACT: mxu1@cs.cmu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Tomografía con Microscopio Electrónico/métodos , Aprendizaje Automático , Estructura Molecular , Análisis por Conglomerados , Procesamiento de Imagen Asistido por Computador/métodos
10.
Methods ; 129: 18-23, 2017 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-28917724

RESUMEN

Genome-wide association studies have discovered a large number of genetic variants associated with complex diseases such as Alzheimer's disease. However, the genetic background of such diseases is largely unknown due to the complex mechanisms underlying genetic effects on traits, as well as a small sample size (e.g., 1000) and a large number of genetic variants (e.g., 1 million). Fortunately, datasets that contain genotypes, transcripts, and phenotypes are becoming more readily available, creating new opportunities for detecting disease-associated genetic variants. In this paper, we present a novel approach called "Backward Three-way Association Mapping" (BTAM) for detecting three-way associations among genotypes, transcripts, and phenotypes. Assuming that genotypes affect transcript levels, which in turn affect phenotypes, we first find transcripts associated with the phenotypes, and then find genotypes associated with the chosen transcripts. The backward ordering of association mappings allows us to avoid a large number of association testings between all genotypes and all transcripts, making it possible to identify three-way associations with a small computational cost. In our simulation study, we demonstrate that BTAM significantly improves the statistical power over "forward" three-way association mapping that finds genotypes associated with both transcripts and phenotypes and genotype-phenotype association mapping. Furthermore, we apply BTAM on an Alzheimer's disease dataset and report top 10 genotype-transcript-phenotype associations.


Asunto(s)
Mapeo Cromosómico/métodos , Estudios de Asociación Genética/métodos , Variación Genética/genética , Estudio de Asociación del Genoma Completo/métodos , Algoritmos , Genotipo , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos
11.
Brief Bioinform ; 16(2): 183-92, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-25053743

RESUMEN

Machine learning, particularly kernel methods, has been demonstrated as a promising new tool to tackle the challenges imposed by today's explosive data growth in genomics. They provide a practical and principled approach to learning how a large number of genetic variants are associated with complex phenotypes, to help reveal the complexity in the relationship between the genetic markers and the outcome of interest. In this review, we highlight the potential key role it will have in modern genomic data processing, especially with regard to integration with classical methods for gene prioritizing, prediction and data fusion.


Asunto(s)
Genómica/estadística & datos numéricos , Aprendizaje Automático , Biología Computacional , Interpretación Estadística de Datos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Modelos Logísticos , Modelos Estadísticos , Polimorfismo de Nucleótido Simple , Máquina de Vectores de Soporte
12.
Bioinformatics ; 32(12): i164-i173, 2016 06 15.
Artículo en Inglés | MEDLINE | ID: mdl-27307613

RESUMEN

MOTIVATION: It remains a challenge to detect associations between genotypes and phenotypes because of insufficient sample sizes and complex underlying mechanisms involved in associations. Fortunately, it is becoming more feasible to obtain gene expression data in addition to genotypes and phenotypes, giving us new opportunities to detect true genotype-phenotype associations while unveiling their association mechanisms. RESULTS: In this article, we propose a novel method, NETAM, that accurately detects associations between SNPs and phenotypes, as well as gene traits involved in such associations. We take a network-driven approach: NETAM first constructs an association network, where nodes represent SNPs, gene traits or phenotypes, and edges represent the strength of association between two nodes. NETAM assigns a score to each path from an SNP to a phenotype, and then identifies significant paths based on the scores. In our simulation study, we show that NETAM finds significantly more phenotype-associated SNPs than traditional genotype-phenotype association analysis under false positive control, taking advantage of gene expression data. Furthermore, we applied NETAM on late-onset Alzheimer's disease data and identified 477 significant path associations, among which we analyzed paths related to beta-amyloid, estrogen, and nicotine pathways. We also provide hypothetical biological pathways to explain our findings. AVAILABILITY AND IMPLEMENTATION: Software is available at http://www.sailing.cs.cmu.edu/ CONTACT: : epxing@cs.cmu.edu.


Asunto(s)
Estudio de Asociación del Genoma Completo , Mapeo Cromosómico , Estudios de Asociación Genética , Genotipo , Fenotipo , Polimorfismo de Nucleótido Simple
13.
Bioinformatics ; 32(19): 2903-10, 2016 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-27296983

RESUMEN

MOTIVATION: Despite the widespread popularity of genome-wide association studies (GWAS) for genetic mapping of complex traits, most existing GWAS methodologies are still limited to the use of static phenotypes measured at a single time point. In this work, we propose a new method for association mapping that considers dynamic phenotypes measured at a sequence of time points. Our approach relies on the use of Time-Varying Group Sparse Additive Models (TV-GroupSpAM) for high-dimensional, functional regression. RESULTS: This new model detects a sparse set of genomic loci that are associated with trait dynamics, and demonstrates increased statistical power over existing methods. We evaluate our method via experiments on synthetic data and perform a proof-of-concept analysis for detecting single nucleotide polymorphisms associated with two phenotypes used to assess asthma severity: forced vital capacity, a sensitive measure of airway obstruction and bronchodilator response, which measures lung response to bronchodilator drugs. AVAILABILITY AND IMPLEMENTATION: Source code for TV-GroupSpAM freely available for download at http://www.cs.cmu.edu/~mmarchet/projects/tv_group_spam, implemented in MATLAB. CONTACT: epxing@cs.cmu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Modelos Genéticos , Polimorfismo de Nucleótido Simple , Mapeo Cromosómico , Genoma , Estudio de Asociación del Genoma Completo , Humanos , Fenotipo
14.
J Allergy Clin Immunol ; 137(5): 1390-1397.e6, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-26792209

RESUMEN

BACKGROUND: Recent studies have used cluster analysis to identify phenotypic clusters of asthma with differences in clinical traits, as well as differences in response to therapy with anti-inflammatory medications. However, the correspondence between different phenotypic clusters and differences in the underlying molecular mechanisms of asthma pathogenesis remains unclear. OBJECTIVE: We sought to determine whether clinical differences among children with asthma in different phenotypic clusters corresponded to differences in levels of gene expression. METHODS: We explored differences in gene expression profiles of CD4(+) lymphocytes isolated from the peripheral blood of 299 young adult participants in the Childhood Asthma Management Program study. We obtained gene expression profiles from study subjects between 9 and 14 years of age after they participated in a randomized, controlled longitudinal study examining the effects of inhaled anti-inflammatory medications over a 48-month study period, and we evaluated the correspondence between our earlier phenotypic cluster analysis and subsequent follow-up clinical and molecular profiles. RESULTS: We found that differences in clinical characteristics observed between subjects assigned to different phenotypic clusters persisted into young adulthood and that these clinical differences were associated with differences in gene expression patterns between subjects in different clusters. We identified a subset of genes associated with atopic status, validated the presence of an atopic signature among these genes in an independent cohort of asthmatic subjects, and identified the presence of common transcription factor binding sites corresponding to glucocorticoid receptor binding. CONCLUSION: These findings suggest that phenotypic clusters are associated with differences in the underlying pathobiology of asthma. Further experiments are necessary to confirm these findings.


Asunto(s)
Asma/genética , Hipersensibilidad Inmediata/genética , Adolescente , Asma/sangre , Asma/inmunología , Asma/fisiopatología , Linfocitos T CD4-Positivos/metabolismo , Niño , Eosinófilos/inmunología , Femenino , Perfilación de la Expresión Génica , Humanos , Inmunoglobulina E/sangre , Masculino , Fenotipo , Ensayos Clínicos Controlados Aleatorios como Asunto , Espirometría , Transcriptoma
15.
PLoS Comput Biol ; 10(7): e1003713, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-25057922

RESUMEN

The HMT3522 progression series of human breast cells have been used to discover how tissue architecture, microenvironment and signaling molecules affect breast cell growth and behaviors. However, much remains to be elucidated about malignant and phenotypic reversion behaviors of the HMT3522-T4-2 cells of this series. We employed a "pan-cell-state" strategy, and analyzed jointly microarray profiles obtained from different state-specific cell populations from this progression and reversion model of the breast cells using a tree-lineage multi-network inference algorithm, Treegl. We found that different breast cell states contain distinct gene networks. The network specific to non-malignant HMT3522-S1 cells is dominated by genes involved in normal processes, whereas the T4-2-specific network is enriched with cancer-related genes. The networks specific to various conditions of the reverted T4-2 cells are enriched with pathways suggestive of compensatory effects, consistent with clinical data showing patient resistance to anticancer drugs. We validated the findings using an external dataset, and showed that aberrant expression values of certain hubs in the identified networks are associated with poor clinical outcomes. Thus, analysis of various reversion conditions (including non-reverted) of HMT3522 cells using Treegl can be a good model system to study drug effects on breast cancer.


Asunto(s)
Algoritmos , Neoplasias de la Mama/genética , Biología Computacional/métodos , Línea Celular Tumoral , Simulación por Computador , Bases de Datos Factuales , Progresión de la Enfermedad , Femenino , Redes Reguladoras de Genes , Humanos , Estimación de Kaplan-Meier , Cadenas de Markov , Análisis de Secuencia por Matrices de Oligonucleótidos
16.
Neural Comput ; 26(1): 185-207, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24102126

RESUMEN

The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this letter, we consider a feature-wise kernelized Lasso for capturing nonlinear input-output dependency. We first show that with particular choices of kernel functions, nonredundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures such as the Hilbert-Schmidt independence criterion. We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to high-dimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments for classification and regression with thousands of features.


Asunto(s)
Algoritmos , Inteligencia Artificial , Dinámicas no Lineales , Reconocimiento de Normas Patrones Automatizadas/métodos , Animales , Análisis de Secuencia por Matrices de Oligonucleótidos , Ratas
17.
PLoS Comput Biol ; 9(10): e1003227, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24130465

RESUMEN

Accurate inference of molecular and functional interactions among genes, especially in multicellular organisms such as Drosophila, often requires statistical analysis of correlations not only between the magnitudes of gene expressions, but also between their temporal-spatial patterns. The ISH (in-situ-hybridization)-based gene expression micro-imaging technology offers an effective approach to perform large-scale spatial-temporal profiling of whole-body mRNA abundance. However, analytical tools for discovering gene interactions from such data remain an open challenge due to various reasons, including difficulties in extracting canonical representations of gene activities from images, and in inference of statistically meaningful networks from such representations. In this paper, we present GINI, a machine learning system for inferring gene interaction networks from Drosophila embryonic ISH images. GINI builds on a computer-vision-inspired vector-space representation of the spatial pattern of gene expression in ISH images, enabled by our recently developed [Formula: see text] system; and a new multi-instance-kernel algorithm that learns a sparse Markov network model, in which, every gene (i.e., node) in the network is represented by a vector-valued spatial pattern rather than a scalar-valued gene intensity as in conventional approaches such as a Gaussian graphical model. By capturing the notion of spatial similarity of gene expression, and at the same time properly taking into account the presence of multiple images per gene via multi-instance kernels, GINI is well-positioned to infer statistically sound, and biologically meaningful gene interaction networks from image data. Using both synthetic data and a small manually curated data set, we demonstrate the effectiveness of our approach in network building. Furthermore, we report results on a large publicly available collection of Drosophila embryonic ISH images from the Berkeley Drosophila Genome Project, where GINI makes novel and interesting predictions of gene interactions. Software for GINI is available at http://sailing.cs.cmu.edu/Drosophila_ISH_images/


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes/genética , Redes Reguladoras de Genes/fisiología , Procesamiento de Imagen Asistido por Computador/métodos , Hibridación in Situ/métodos , Animales , Drosophila/genética , Drosophila/metabolismo , Cadenas de Markov
18.
BMC Genomics ; 14: 196, 2013 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-23514438

RESUMEN

BACKGROUND: Association analysis using genome-wide expression quantitative trait locus (eQTL) data investigates the effect that genetic variation has on cellular pathways and leads to the discovery of candidate regulators. Traditional analysis of eQTL data via pairwise statistical significance tests or linear regression does not leverage the availability of the structural information of the transcriptome, such as presence of gene networks that reveal correlation and potentially regulatory relationships among the study genes. We employ a new eQTL mapping algorithm, GFlasso, which we have previously developed for sparse structured regression, to reanalyze a genome-wide yeast dataset. GFlasso fully takes into account the dependencies among expression traits to suppress false positives and to enhance the signal/noise ratio. Thus, GFlasso leverages the gene-interaction network to discover the pleiotropic effects of genetic loci that perturb the expression level of multiple (rather than individual) genes, which enables us to gain more power in detecting previously neglected signals that are marginally weak but pleiotropically significant. RESULTS: While eQTL hotspots in yeast have been reported previously as genomic regions controlling multiple genes, our analysis reveals additional novel eQTL hotspots and, more interestingly, uncovers groups of multiple contributing eQTL hotspots that affect the expression level of functional gene modules. To our knowledge, our study is the first to report this type of gene regulation stemming from multiple eQTL hotspots. Additionally, we report the results from in-depth bioinformatics analysis for three groups of these eQTL hotspots: ribosome biogenesis, telomere silencing, and retrotransposon biology. We suggest candidate regulators for the functional gene modules that map to each group of hotspots. Not only do we find that many of these candidate regulators contain mutations in the promoter and coding regions of the genes, in the case of the Ribi group, we provide experimental evidence suggesting that the identified candidates do regulate the target genes predicted by GFlasso. CONCLUSIONS: Thus, this structured association analysis of a yeast eQTL dataset via GFlasso, coupled with extensive bioinformatics analysis, discovers a novel regulation pattern between multiple eQTL hotspots and functional gene modules. Furthermore, this analysis demonstrates the potential of GFlasso as a powerful computational tool for eQTL studies that exploit the rich structural information among expression traits due to correlation, regulation, or other forms of biological dependencies.


Asunto(s)
Saccharomyces cerevisiae/genética , Biología Computacional , Bases de Datos Genéticas , Regulación Fúngica de la Expresión Génica , Redes Reguladoras de Genes , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Ribosomas/genética , Ribosomas/metabolismo , Telómero/genética , Telómero/metabolismo
19.
Bioinformatics ; 28(12): i137-46, 2012 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-22689753

RESUMEN

MOTIVATION: As many complex disease and expression phenotypes are the outcome of intricate perturbation of molecular networks underlying gene regulation resulted from interdependent genome variations, association mapping of causal QTLs or expression quantitative trait loci must consider both additive and epistatic effects of multiple candidate genotypes. This problem poses a significant challenge to contemporary genome-wide-association (GWA) mapping technologies because of its computational complexity. Fortunately, a plethora of recent developments in biological network community, especially the availability of genetic interaction networks, make it possible to construct informative priors of complex interactions between genotypes, which can substantially reduce the complexity and increase the statistical power of GWA inference. RESULTS: In this article, we consider the problem of learning a multitask regression model while taking advantage of the prior information on structures on both the inputs (genetic variations) and outputs (expression levels). We propose a novel regularization scheme over multitask regression called jointly structured input-output lasso based on an ℓ(1)/ℓ(2) norm, which allows shared sparsity patterns for related inputs and outputs to be optimally estimated. Such patterns capture multiple related single nucleotide polymorphisms (SNPs) that jointly influence multiple-related expression traits. In addition, we generalize this new multitask regression to structurally regularized polynomial regression to detect epistatic interactions with manageable complexity by exploiting the prior knowledge on candidate SNPs for epistatic effects from biological experiments. We demonstrate our method on simulated and yeast eQTL datasets. AVAILABILITY: Software is available at http://www.sailing.cs.cmu.edu/.


Asunto(s)
Mapeo Cromosómico/métodos , Epistasis Genética , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Algoritmos , Biología Computacional/métodos , Genoma , Genotipo , Humanos , Modelos Lineales , Fenotipo , Saccharomyces cerevisiae/genética , Programas Informáticos
20.
IEEE Trans Pattern Anal Mach Intell ; 45(11): 12832-12843, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-35917572

RESUMEN

Few-shot object detection has been extensively investigated by incorporating meta-learning into region-based detection frameworks. Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes. Such limitations hinder the generalization of base-class knowledge for the detection of novel-class objects. In this work, we design Meta-DETR, which (i) is the first image-level few-shot detector, and (ii) introduces a novel inter-class correlational meta-learning strategy to capture and leverage the correlation among different classes for robust and accurate few-shot object detection. Meta-DETR works entirely at image level without any region proposals, which circumvents the constraint of inaccurate proposals in prevalent few-shot detection frameworks. In addition, the introduced correlational meta-learning enables Meta-DETR to simultaneously attend to multiple support classes within a single feedforward, which allows to capture the inter-class correlation among different classes, thus significantly reducing the misclassification over similar classes and enhancing knowledge generalization to novel classes. Experiments over multiple few-shot object detection benchmarks show that the proposed Meta-DETR outperforms state-of-the-art methods by large margins. The implementation codes are publicly available at https://github.com/ZhangGongjie/Meta-DETR.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA