Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
PLoS Genet ; 19(5): e1010760, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37200393

RESUMO

Heterozygous variants in the glucocerebrosidase (GBA) gene are common and potent risk factors for Parkinson's disease (PD). GBA also causes the autosomal recessive lysosomal storage disorder (LSD), Gaucher disease, and emerging evidence from human genetics implicates many other LSD genes in PD susceptibility. We have systemically tested 86 conserved fly homologs of 37 human LSD genes for requirements in the aging adult Drosophila brain and for potential genetic interactions with neurodegeneration caused by α-synuclein (αSyn), which forms Lewy body pathology in PD. Our screen identifies 15 genetic enhancers of αSyn-induced progressive locomotor dysfunction, including knockdown of fly homologs of GBA and other LSD genes with independent support as PD susceptibility factors from human genetics (SCARB2, SMPD1, CTSD, GNPTAB, SLC17A5). For several genes, results from multiple alleles suggest dose-sensitivity and context-dependent pleiotropy in the presence or absence of αSyn. Homologs of two genes causing cholesterol storage disorders, Npc1a / NPC1 and Lip4 / LIPA, were independently confirmed as loss-of-function enhancers of αSyn-induced retinal degeneration. The enzymes encoded by several modifier genes are upregulated in αSyn transgenic flies, based on unbiased proteomics, revealing a possible, albeit ineffective, compensatory response. Overall, our results reinforce the important role of lysosomal genes in brain health and PD pathogenesis, and implicate several metabolic pathways, including cholesterol homeostasis, in αSyn-mediated neurotoxicity.


Assuntos
Doença de Parkinson , alfa-Sinucleína , Animais , Humanos , alfa-Sinucleína/genética , alfa-Sinucleína/metabolismo , Animais Geneticamente Modificados , Drosophila/genética , Drosophila/metabolismo , Glucosilceramidase/genética , Glucosilceramidase/metabolismo , Lisossomos/metabolismo , Doença de Parkinson/patologia , Transferases (Outros Grupos de Fosfato Substituídos)/metabolismo , Envelhecimento/metabolismo
2.
Bioinformatics ; 39(10)2023 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-37792497

RESUMO

MOTIVATION: Nuclear magnetic resonance spectroscopy (NMR) is widely used to analyze metabolites in biological samples, but the analysis requires specific expertise, it is time-consuming, and can be inaccurate. Here, we present a powerful automate tool, SPatial clustering Algorithm-Statistical TOtal Correlation SpectroscopY (SPA-STOCSY), which overcomes challenges faced when analyzing NMR data and identifies metabolites in a sample with high accuracy. RESULTS: As a data-driven method, SPA-STOCSY estimates all parameters from the input dataset. It first investigates the covariance pattern among datapoints and then calculates the optimal threshold with which to cluster datapoints belonging to the same structural unit, i.e. the metabolite. Generated clusters are then automatically linked to a metabolite library to identify candidates. To assess SPA-STOCSY's efficiency and accuracy, we applied it to synthesized spectra and spectra acquired on Drosophila melanogaster tissue and human embryonic stem cells. In the synthesized spectra, SPA outperformed Statistical Recoupling of Variables (SRV), an existing method for clustering spectral peaks, by capturing a higher percentage of the signal regions and the close-to-zero noise regions. In the biological data, SPA-STOCSY performed comparably to the operator-based Chenomx analysis while avoiding operator bias, and it required <7 min of total computation time. Overall, SPA-STOCSY is a fast, accurate, and unbiased tool for untargeted analysis of metabolites in the NMR spectra. It may thus accelerate the use of NMR for scientific discoveries, medical diagnostics, and patient-specific decision making. AVAILABILITY AND IMPLEMENTATION: The codes of SPA-STOCSY are available at https://github.com/LiuzLab/SPA-STOCSY.


Assuntos
Drosophila melanogaster , Imageamento por Ressonância Magnética , Animais , Humanos , Espectroscopia de Ressonância Magnética/métodos , Análise por Conglomerados , Metabolômica/métodos
3.
J Physiol ; 601(21): 4767-4806, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37786382

RESUMO

Comprehensive and accurate analysis of respiratory and metabolic data is crucial to modelling congenital, pathogenic and degenerative diseases converging on autonomic control failure. A lack of tools for high-throughput analysis of respiratory datasets remains a major challenge. We present Breathe Easy, a novel open-source pipeline for processing raw recordings and associated metadata into operative outcomes, publication-worthy graphs and robust statistical analyses including QQ and residual plots for assumption queries and data transformations. This pipeline uses a facile graphical user interface for uploading data files, setting waveform feature thresholds and defining experimental variables. Breathe Easy was validated against manual selection by experts, which represents the current standard in the field. We demonstrate Breathe Easy's utility by examining a 2-year longitudinal study of an Alzheimer's disease mouse model to assess contributions of forebrain pathology in disordered breathing. Whole body plethysmography has become an important experimental outcome measure for a variety of diseases with primary and secondary respiratory indications. Respiratory dysfunction, while not an initial symptom in many of these disorders, often drives disability or death in patient outcomes. Breathe Easy provides an open-source respiratory analysis tool for all respiratory datasets and represents a necessary improvement upon current analytical methods in the field. KEY POINTS: Respiratory dysfunction is a common endpoint for disability and mortality in many disorders throughout life. Whole body plethysmography in rodents represents a high face-value method for measuring respiratory outcomes in rodent models of these diseases and disorders. Analysis of key respiratory variables remains hindered by manual annotation and analysis that leads to low throughput results that often exclude a majority of the recorded data. Here we present a software suite, Breathe Easy, that automates the process of data selection from raw recordings derived from plethysmography experiments and the analysis of these data into operative outcomes and publication-worthy graphs with statistics. We validate Breathe Easy with a terabyte-scale Alzheimer's dataset that examines the effects of forebrain pathology on respiratory function over 2 years of degeneration.


Assuntos
Respiração , Software , Animais , Camundongos , Humanos , Estudos Longitudinais , Pletismografia
4.
PLoS Comput Biol ; 18(10): e1010577, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36191044

RESUMO

Consensus clustering has been widely used in bioinformatics and other applications to improve the accuracy, stability and reliability of clustering results. This approach ensembles cluster co-occurrences from multiple clustering runs on subsampled observations. For application to large-scale bioinformatics data, such as to discover cell types from single-cell sequencing data, for example, consensus clustering has two significant drawbacks: (i) computational inefficiency due to repeatedly applying clustering algorithms, and (ii) lack of interpretability into the important features for differentiating clusters. In this paper, we address these two challenges by developing IMPACC: Interpretable MiniPatch Adaptive Consensus Clustering. Our approach adopts three major innovations. We ensemble cluster co-occurrences from tiny subsets of both observations and features, termed minipatches, thus dramatically reducing computation time. Additionally, we develop adaptive sampling schemes for observations, which result in both improved reliability and computational savings, as well as adaptive sampling schemes of features, which lead to interpretable solutions by quickly learning the most relevant features that differentiate clusters. We study our approach on synthetic data and a variety of real large-scale bioinformatics data sets; results show that our approach not only yields more accurate and interpretable cluster solutions, but it also substantially improves computational efficiency compared to standard consensus clustering approaches.


Assuntos
Algoritmos , Biologia Computacional , Análise por Conglomerados , Biologia Computacional/métodos , Consenso , Reprodutibilidade dos Testes
5.
Biometrics ; 79(4): 3846-3858, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-36950906

RESUMO

Clustering has long been a popular unsupervised learning approach to identify groups of similar objects and discover patterns from unlabeled data in many applications. Yet, coming up with meaningful interpretations of the estimated clusters has often been challenging precisely due to their unsupervised nature. Meanwhile, in many real-world scenarios, there are some noisy supervising auxiliary variables, for instance, subjective diagnostic opinions, that are related to the observed heterogeneity of the unlabeled data. By leveraging information from both supervising auxiliary variables and unlabeled data, we seek to uncover more scientifically interpretable group structures that may be hidden by completely unsupervised analyses. In this work, we propose and develop a new statistical pattern discovery method named supervised convex clustering (SCC) that borrows strength from both information sources and guides towards finding more interpretable patterns via a joint convex fusion penalty. We develop several extensions of SCC to integrate different types of supervising auxiliary variables, to adjust for additional covariates, and to find biclusters. We demonstrate the practical advantages of SCC through simulations and a case study on Alzheimer's disease genomics. Specifically, we discover new candidate genes as well as new subtypes of Alzheimer's disease that can potentially lead to better understanding of the underlying genetic mechanisms responsible for the observed heterogeneity of cognitive decline in older adults.


Assuntos
Doença de Alzheimer , Humanos , Idoso , Doença de Alzheimer/genética , Genômica , Análise por Conglomerados
6.
BMC Biol ; 20(1): 28, 2022 01 28.
Artigo em Inglês | MEDLINE | ID: mdl-35086530

RESUMO

BACKGROUND: The functional understanding of genetic interaction networks and cellular mechanisms governing health and disease requires the dissection, and multifaceted study, of discrete cell subtypes in developing and adult animal models. Recombinase-driven expression of transgenic effector alleles represents a significant and powerful approach to delineate cell populations for functional, molecular, and anatomical studies. In addition to single recombinase systems, the expression of two recombinases in distinct, but partially overlapping, populations allows for more defined target expression. Although the application of this method is becoming increasingly popular, its experimental implementation has been broadly restricted to manipulations of a limited set of common alleles that are often commercially produced at great expense, with costs and technical challenges associated with production of intersectional mouse lines hindering customized approaches to many researchers. Here, we present a simplified CRISPR toolkit for rapid, inexpensive, and facile intersectional allele production. RESULTS: Briefly, we produced 7 intersectional mouse lines using a dual recombinase system, one mouse line with a single recombinase system, and three embryonic stem (ES) cell lines that are designed to study the way functional, molecular, and anatomical features relate to each other in building circuits that underlie physiology and behavior. As a proof-of-principle, we applied three of these lines to different neuronal populations for anatomical mapping and functional in vivo investigation of respiratory control. We also generated a mouse line with a single recombinase-responsive allele that controls the expression of the calcium sensor Twitch-2B. This mouse line was applied globally to study the effects of follicle-stimulating hormone (FSH) and luteinizing hormone (LH) on calcium release in the ovarian follicle. CONCLUSIONS: The lines presented here are representative examples of outcomes possible with the successful application of our genetic toolkit for the facile development of diverse, modifiable animal models. This toolkit will allow labs to create single or dual recombinase effector lines easily for any cell population or subpopulation of interest when paired with the appropriate Cre and FLP recombinase mouse lines or viral vectors. We have made our tools and derivative intersectional mouse and ES cell lines openly available for non-commercial use through publicly curated repositories for plasmid DNA, ES cells, and transgenic mouse lines.


Assuntos
Cálcio , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Animais , Feminino , Integrases/genética , Integrases/metabolismo , Camundongos , Camundongos Transgênicos , Neurônios/fisiologia , Recombinases/genética , Recombinases/metabolismo
7.
Hepatology ; 73(6): 2278-2292, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-32931023

RESUMO

BACKGROUND AND AIMS: Therapeutic, clinical trial entry and stratification decisions for hepatocellular carcinoma (HCC) are made based on prognostic assessments, using clinical staging systems based on small numbers of empirically selected variables that insufficiently account for differences in biological characteristics of individual patients' disease. APPROACH AND RESULTS: We propose an approach for constructing risk scores from circulating biomarkers that produce a global biological characterization of individual patient's disease. Plasma samples were collected prospectively from 767 patients with HCC and 200 controls, and 317 proteins were quantified in a Clinical Laboratory Improvement Amendments-certified biomarker testing laboratory. We constructed a circulating biomarker aberration score for each patient, a score between 0 and 1 that measures the degree of aberration of his or her biomarker panel relative to normal, which we call HepatoScore. We used log-rank tests to assess its ability to substratify patients within existing staging systems/prognostic factors. To enhance clinical application, we constructed a single-sample score, HepatoScore-14, which requires only a subset of 14 representative proteins encompassing the global biological effects. Patients with HCC were split into three distinct groups (low, medium, and high HepatoScore) with vastly different prognoses (medial overall survival 38.2/18.3/7.1 months; P < 0.0001). Furthermore, HepatoScore accurately substratified patients within levels of existing prognostic factors and staging systems (P < 0.0001 for nearly all), providing substantial and sometimes dramatic refinement of expected patient outcomes with strong therapeutic implications. These results were recapitulated by HepatoScore-14, rigorously validated in repeated training/test splits, concordant across Myriad RBM (Austin, TX) and enzyme-linked immunosorbent assay kits, and established as an independent prognostic factor. CONCLUSIONS: HepatoScore-14 augments existing HCC staging systems, dramatically refining patient prognostic assessments and therapeutic decision making and enrollment in clinical trials. The underlying strategy provides a global biological characterization of disease, and can be applied broadly to other disease settings and biological media.


Assuntos
Biomarcadores Tumorais/sangue , Carcinoma Hepatocelular/sangue , Neoplasias Hepáticas/sangue , Índice de Gravidade de Doença , Carcinoma Hepatocelular/patologia , Estudos de Casos e Controles , Feminino , Humanos , Neoplasias Hepáticas/patologia , Masculino , Valor Preditivo dos Testes , Prognóstico , Modelos de Riscos Proporcionais , Fatores de Risco
8.
Neuroimage ; 197: 330-343, 2019 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-31029870

RESUMO

Advanced brain imaging techniques make it possible to measure individuals' structural connectomes in large cohort studies non-invasively. Given the availability of large scale data sets, it is extremely interesting and important to build a set of advanced tools for structural connectome extraction and statistical analysis that emphasize both interpretability and predictive power. In this paper, we developed and integrated a set of toolboxes, including an advanced structural connectome extraction pipeline and a novel tensor network principal components analysis (TN-PCA) method, to study relationships between structural connectomes and various human traits such as alcohol and drug use, cognition and motion abilities. The structural connectome extraction pipeline produces a set of connectome features for each subject that can be organized as a tensor network, and TN-PCA maps the high-dimensional tensor network data to a lower-dimensional Euclidean space. Combined with classical hypothesis testing, canonical correlation analysis and linear discriminant analysis techniques, we analyzed over 1100 scans of 1076 subjects from the Human Connectome Project (HCP) and the Sherbrooke test-retest data set, as well as 175 human traits measuring different domains including cognition, substance use, motor, sensory and emotion. The test-retest data validated the developed algorithms. With the HCP data, we found that structural connectomes are associated with a wide range of traits, e.g., fluid intelligence, language comprehension, and motor skills are associated with increased cortical-cortical brain structural connectivity, while the use of alcohol, tobacco, and marijuana are associated with decreased cortical-cortical connectivity. We also demonstrated that our extracted structural connectomes and analysis method can give superior prediction accuracies compared with alternative connectome constructions and other tensor and network regression methods.


Assuntos
Encéfalo/anatomia & histologia , Conectoma/métodos , Imagem de Tensor de Difusão/métodos , Processamento de Imagem Assistida por Computador/métodos , Personalidade/fisiologia , Encéfalo/diagnóstico por imagem , Interpretação Estatística de Dados , Feminino , Humanos , Masculino , Modelos Neurológicos , Vias Neurais/anatomia & histologia , Análise de Componente Principal
9.
Bioinformatics ; 34(7): 1141-1147, 2018 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-29617963

RESUMO

Motivation: Batch effects are one of the major source of technical variations that affect the measurements in high-throughput studies such as RNA sequencing. It has been well established that batch effects can be caused by different experimental platforms, laboratory conditions, different sources of samples and personnel differences. These differences can confound the outcomes of interest and lead to spurious results. A critical input for batch correction algorithms is the knowledge of batch factors, which in many cases are unknown or inaccurate. Hence, the primary motivation of our paper is to detect hidden batch factors that can be used in standard techniques to accurately capture the relationship between gene expression and other modeled variables of interest. Results: We introduce a new algorithm based on data-adaptive shrinkage and semi-Non-negative Matrix Factorization for the detection of unknown batch effects. We test our algorithm on three different datasets: (i) Sequencing Quality Control, (ii) Topotecan RNA-Seq and (iii) Single-cell RNA sequencing (scRNA-Seq) on Glioblastoma Multiforme. We have demonstrated a superior performance in identifying hidden batch effects as compared to existing algorithms for batch detection in all three datasets. In the Topotecan study, we were able to identify a new batch factor that has been missed by the original study, leading to under-representation of differentially expressed genes. For scRNA-Seq, we demonstrated the power of our method in detecting subtle batch effects. Availability and implementation: DASC R package is available via Bioconductor or at https://github.com/zhanglabNKU/DASC. Contact: zhanghan@nankai.edu.cn or zhandonl@bcm.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Controle de Qualidade , Projetos de Pesquisa , Análise de Sequência de RNA/métodos , Glioblastoma/genética , Humanos , Topotecan/farmacologia
10.
BMC Bioinformatics ; 18(Suppl 11): 405, 2017 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-28984189

RESUMO

The 2016 International Conference on Intelligent Biology and Medicine (ICIBM 2016) was held on December 8-10, 2016 in Houston, Texas, USA. ICIBM included eight scientific sessions, four tutorials, one poster session, four highlighted talks and four keynotes that covered topics on 3D genomics structural analysis, next generation sequencing (NGS) analysis, computational drug discovery, medical informatics, cancer genomics, and systems biology. Here, we present a summary of the nine research articles selected from ICIBM 2016 program for publishing in BMC Bioinformatics.


Assuntos
Biologia , Congressos como Assunto , Internacionalidade , Medicina , Estatística como Assunto , Algoritmos , Variações do Número de Cópias de DNA/genética , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Splicing de RNA/genética , Análise de Sequência de RNA
11.
BMC Genomics ; 18(Suppl 6): 703, 2017 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-28984207

RESUMO

In this editorial, we first summarize the 2016 International Conference on Intelligent Biology and Medicine (ICIBM 2016) that was held on December 8-10, 2016 in Houston, Texas, USA, and then briefly introduce the ten research articles included in this supplement issue. ICIBM 2016 included four workshops or tutorials, four keynote lectures, four conference invited talks, eight concurrent scientific sessions and a poster session for 53 accepted abstracts, covering current topics in bioinformatics, systems biology, intelligent computing, and biomedical informatics. Through our call for papers, a total of 77 original manuscripts were submitted to ICIBM 2016. After peer review, 11 articles were selected in this special issue, covering topics such as single cell RNA-seq analysis method, genome sequence and variation analysis, bioinformatics method for vaccine development, and cancer genomics.


Assuntos
Genômica , Invenções , Medicina
12.
Bioinformatics ; 32(6): 952-4, 2016 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-26568634

RESUMO

MOTIVATION: Massive amounts of high-throughput genomics data profiled from tumor samples were made publicly available by the Cancer Genome Atlas (TCGA). RESULTS: We have developed an open source software package, TCGA2STAT, to obtain the TCGA data, wrangle it, and pre-process it into a format ready for multivariate and integrated statistical analysis in the R environment. In a user-friendly format with one single function call, our package downloads and fully processes the desired TCGA data to be seamlessly integrated into a computational analysis pipeline. No further technical or biological knowledge is needed to utilize our software, thus making TCGA data easily accessible to data scientists without specific domain knowledge. AVAILABILITY AND IMPLEMENTATION: TCGA2STAT is available from the https://cran.r-project.org/web/packages/TCGA2STAT/index.html SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. CONTACT: zhandong.liu@bcm.edu.


Assuntos
Software , Genômica , Humanos , Neoplasias
13.
Biometrics ; 73(1): 10-19, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-27163413

RESUMO

In the biclustering problem, we seek to simultaneously group observations and features. While biclustering has applications in a wide array of domains, ranging from text mining to collaborative filtering, the problem of identifying structure in high-dimensional genomic data motivates this work. In this context, biclustering enables us to identify subsets of genes that are co-expressed only within a subset of experimental conditions. We present a convex formulation of the biclustering problem that possesses a unique global minimizer and an iterative algorithm, COBRA, that is guaranteed to identify it. Our approach generates an entire solution path of possible biclusters as a single tuning parameter is varied. We also show how to reduce the problem of selecting this tuning parameter to solving a trivial modification of the convex biclustering problem. The key contributions of our work are its simplicity, interpretability, and algorithmic guarantees-features that arguably are lacking in the current alternative algorithms. We demonstrate the advantages of our approach, which includes stably and reproducibly identifying biclusterings, on simulated and real microarray data.


Assuntos
Análise por Conglomerados , Interpretação Estatística de Dados , Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/métodos , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos
14.
Alzheimers Dement ; 12(6): 645-53, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27079753

RESUMO

Identifying accurate biomarkers of cognitive decline is essential for advancing early diagnosis and prevention therapies in Alzheimer's disease. The Alzheimer's disease DREAM Challenge was designed as a computational crowdsourced project to benchmark the current state-of-the-art in predicting cognitive outcomes in Alzheimer's disease based on high dimensional, publicly available genetic and structural imaging data. This meta-analysis failed to identify a meaningful predictor developed from either data modality, suggesting that alternate approaches should be considered for prediction of cognitive performance.


Assuntos
Doença de Alzheimer/complicações , Transtornos Cognitivos/diagnóstico , Transtornos Cognitivos/etiologia , Doença de Alzheimer/genética , Apolipoproteínas E/genética , Biomarcadores , Transtornos Cognitivos/genética , Biologia Computacional , Bases de Dados Bibliográficas/estatística & dados numéricos , Humanos , Valor Preditivo dos Testes
15.
Hum Brain Mapp ; 36(11): 4566-81, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26304096

RESUMO

Neurofibromatosis type I (NF1) is a genetic disorder caused by mutations in the neurofibromin 1 gene at locus 17q11.2. Individuals with NF1 have an increased incidence of learning disabilities, attention deficits, and autism spectrum disorders. As a single-gene disorder, NF1 represents a valuable model for understanding gene-brain-behavior relationships. While mouse models have elucidated molecular and cellular mechanisms underlying learning deficits associated with this mutation, little is known about functional brain architecture in human subjects with NF1. To address this question, we used resting state functional connectivity magnetic resonance imaging (rs-fcMRI) to elucidate the intrinsic network structure of 30 NF1 participants compared with 30 healthy demographically matched controls during an eyes-open rs-fcMRI scan. Novel statistical methods were employed to quantify differences in local connectivity (edge strength) and modularity structure, in combination with traditional global graph theory applications. Our findings suggest that individuals with NF1 have reduced anterior-posterior connectivity, weaker bilateral edges, and altered modularity clustering relative to healthy controls. Further, edge strength and modular clustering indices were correlated with IQ and internalizing symptoms. These findings suggest that Ras signaling disruption may lead to abnormal functional brain connectivity; further investigation into the functional consequences of these alterations in both humans and in animal models is warranted.


Assuntos
Encéfalo/fisiopatologia , Neuroimagem Funcional/métodos , Rede Nervosa/fisiopatologia , Neurofibromatose 1/fisiopatologia , Adolescente , Adulto , Criança , Feminino , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Pessoa de Meia-Idade , Adulto Jovem
16.
Biometrics ; 71(4): 905-17, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26295449

RESUMO

Technological advances have led to a proliferation of structured big data that have matrix-valued covariates. We are specifically motivated to build predictive models for multi-subject neuroimaging data based on each subject's brain imaging scans. This is an ultra-high-dimensional problem that consists of a matrix of covariates (brain locations by time points) for each subject; few methods currently exist to fit supervised models directly to this tensor data. We propose a novel modeling and algorithmic strategy to apply generalized linear models (GLMs) to this massive tensor data in which one set of variables is associated with locations. Our method begins by fitting GLMs to each location separately, and then builds an ensemble by blending information across locations through regularization with what we term an aggregating penalty. Our so called, Local-Aggregate Model, can be fit in a completely distributed manner over the locations using an Alternating Direction Method of Multipliers (ADMM) strategy, and thus greatly reduces the computational burden. Furthermore, we propose to select the appropriate model through a novel sequence of faster algorithmic solutions that is similar to regularization paths. We will demonstrate both the computational and predictive modeling advantages of our methods via simulations and an EEG classification problem.


Assuntos
Neuroimagem/estatística & dados numéricos , Algoritmos , Biometria/métodos , Simulação por Computador , Eletroencefalografia/estatística & dados numéricos , Humanos , Modelos Lineares , Aprendizado de Máquina/estatística & dados numéricos , Análise de Regressão
17.
J Neurosci ; 33(35): 14098-106, 2013 Aug 28.
Artigo em Inglês | MEDLINE | ID: mdl-23986245

RESUMO

Synesthesia is a condition in which normal stimuli can trigger anomalous associations. In this study, we exploit synesthesia to understand how the synesthetic experience can be explained by subtle changes in network properties. Of the many forms of synesthesia, we focus on colored sequence synesthesia, a form in which colors are associated with overlearned sequences, such as numbers and letters (graphemes). Previous studies have characterized synesthesia using resting-state connectivity or stimulus-driven analyses, but it remains unclear how network properties change as synesthetes move from one condition to another. To address this gap, we used functional MRI in humans to identify grapheme-specific brain regions, thereby constructing a functional "synesthetic" network. We then explored functional connectivity of color and grapheme regions during a synesthesia-inducing fMRI paradigm involving rest, auditory grapheme stimulation, and audiovisual grapheme stimulation. Using Markov networks to represent direct relationships between regions, we found that synesthetes had more connections during rest and auditory conditions. We then expanded the network space to include 90 anatomical regions, revealing that synesthetes tightly cluster in visual regions, whereas controls cluster in parietal and frontal regions. Together, these results suggest that synesthetes have increased connectivity between grapheme and color regions, and that synesthetes use visual regions to a greater extent than controls when presented with dynamic grapheme stimulation. These data suggest that synesthesia is better characterized by studying global network dynamics than by individual properties of a single brain region.


Assuntos
Percepção de Cores , Rede Nervosa/fisiopatologia , Transtornos da Percepção/fisiopatologia , Estimulação Acústica , Adulto , Encéfalo/fisiopatologia , Mapeamento Encefálico , Estudos de Casos e Controles , Feminino , Humanos , Idioma , Imageamento por Ressonância Magnética , Masculino , Cadeias de Markov , Estimulação Luminosa , Sinestesia
18.
BMC Genomics ; 14 Suppl 8: S7, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24564637

RESUMO

BACKGROUND: Selecting genes and pathways indicative of disease is a central problem in computational biology. This problem is especially challenging when parsing multi-dimensional genomic data. A number of tools, such as L1-norm based regularization and its extensions elastic net and fused lasso, have been introduced to deal with this challenge. However, these approaches tend to ignore the vast amount of a priori biological network information curated in the literature. RESULTS: We propose the use of graph Laplacian regularized logistic regression to integrate biological networks into disease classification and pathway association problems. Simulation studies demonstrate that the performance of the proposed algorithm is superior to elastic net and lasso analyses. Utility of this algorithm is also validated by its ability to reliably differentiate breast cancer subtypes using a large breast cancer dataset recently generated by the Cancer Genome Atlas (TCGA) consortium. Many of the protein-protein interaction modules identified by our approach are further supported by evidence published in the literature. Source code of the proposed algorithm is freely available at http://www.github.com/zhandong/Logit-Lapnet. CONCLUSION: Logistic regression with graph Laplacian regularization is an effective algorithm for identifying key pathways and modules associated with disease subtypes. With the rapid expansion of our knowledge of biological regulatory networks, this approach will become more accurate and increasingly useful for mining transcriptomic, epi-genomic, and other types of genome wide association studies.


Assuntos
Algoritmos , Biomarcadores Tumorais/metabolismo , Neoplasias da Mama/metabolismo , Biologia Computacional/métodos , Redes Reguladoras de Genes , Simulação por Computador , Feminino , Humanos , Modelos Logísticos , Modelos Biológicos , Reprodutibilidade dos Testes
19.
bioRxiv ; 2023 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-36865102

RESUMO

Nuclear Magnetic Resonance (NMR) spectroscopy is widely used to analyze metabolites in biological samples, but the analysis can be cumbersome and inaccurate. Here, we present a powerful automated tool, SPA-STOCSY (Spatial Clustering Algorithm - Statistical Total Correlation Spectroscopy), which overcomes the challenges by identifying metabolites in each sample with high accuracy. As a data-driven method, SPA-STOCSY estimates all parameters from the input dataset, first investigating the covariance pattern and then calculating the optimal threshold with which to cluster data points belonging to the same structural unit, i.e. metabolite. The generated clusters are then automatically linked to a compound library to identify candidates. To assess SPA-STOCSY’s efficiency and accuracy, we applied it to synthesized and real NMR data obtained from Drosophila melanogaster brains and human embryonic stem cells. In the synthesized spectra, SPA outperforms Statistical Recoupling of Variables, an existing method for clustering spectral peaks, by capturing a higher percentage of the signal regions and the close-to-zero noise regions. In the real spectra, SPA-STOCSY performs comparably to operator-based Chenomx analysis but avoids operator bias and performs the analyses in less than seven minutes of total computation time. Overall, SPA-STOCSY is a fast, accurate, and unbiased tool for untargeted analysis of metabolites in the NMR spectra. As such, it might accelerate the utilization of NMR for scientific discoveries, medical diagnostics, and patient-specific decision making.

20.
Bioinformatics ; 27(21): 3029-35, 2011 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-21930672

RESUMO

MOTIVATION: Nuclear magnetic resonance (NMR) spectroscopy has been used to study mixtures of metabolites in biological samples. This technology produces a spectrum for each sample depicting the chemical shifts at which an unknown number of latent metabolites resonate. The interpretation of this data with common multivariate exploratory methods such as principal components analysis (PCA) is limited due to high-dimensionality, non-negativity of the underlying spectra and dependencies at adjacent chemical shifts. RESULTS: We develop a novel modification of PCA that is appropriate for analysis of NMR data, entitled Sparse Non-Negative Generalized PCA. This method yields interpretable principal components and loading vectors that select important features and directly account for both the non-negativity of the underlying spectra and dependencies at adjacent chemical shifts. Through the reanalysis of experimental NMR data on five purified neural cell types, we demonstrate the utility of our methods for dimension reduction, pattern recognition, sample exploration and feature selection. Our methods lead to the identification of novel metabolites that reflect the differences between these cell types. AVAILABILITY: www.stat.rice.edu/~gallen/software.html. CONTACT: gallen@rice.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Espectroscopia de Ressonância Magnética , Metabolômica/métodos , Análise de Componente Principal , Algoritmos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA