Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Bioinformatics ; 36(9): 2770-2777, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31930389

RESUMO

SUMMARY: Machine learning feature selection methods are needed to detect complex interaction-network effects in complicated modeling scenarios in high-dimensional data, such as GWAS, gene expression, eQTL and structural/functional neuroimage studies for case-control or continuous outcomes. In addition, many machine learning methods have limited ability to address the issues of controlling false discoveries and adjusting for covariates. To address these challenges, we develop a new feature selection technique called Nearest-neighbor Projected-Distance Regression (NPDR) that calculates the importance of each predictor using generalized linear model regression of distances between nearest-neighbor pairs projected onto the predictor dimension. NPDR captures the underlying interaction structure of data using nearest-neighbors in high dimensions, handles both dichotomous and continuous outcomes and predictor data types, statistically corrects for covariates, and permits statistical inference and penalized regression. We use realistic simulations with interactions and other effects to show that NPDR has better precision-recall than standard Relief-based feature selection and random forest importance, with the additional benefit of covariate adjustment and multiple testing correction. Using RNA-Seq data from a study of major depressive disorder (MDD), we show that NPDR with covariate adjustment removes spurious associations due to confounding. We apply NPDR to eQTL data to identify potentially interacting variants that regulate transcripts associated with MDD and demonstrate NPDR's utility for GWAS and continuous outcomes. AVAILABILITY AND IMPLEMENTATION: Available at: https://insilico.github.io/npdr/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Transtorno Depressivo Maior , Análise por Conglomerados , Humanos , Modelos Lineares , Aprendizado de Máquina , Locos de Características Quantitativas
2.
Bioinformatics ; 36(10): 3093-3098, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31985777

RESUMO

SUMMARY: Feature selection can improve the accuracy of machine-learning models, but appropriate steps must be taken to avoid overfitting. Nested cross-validation (nCV) is a common approach that chooses the classification model and features to represent a given outer fold based on features that give the maximum inner-fold accuracy. Differential privacy is a related technique to avoid overfitting that uses a privacy-preserving noise mechanism to identify features that are stable between training and holdout sets.We develop consensus nested cross-validation (cnCV) that combines the idea of feature stability from differential privacy with nCV. Feature selection is applied in each inner fold and the consensus of top features across folds is used as a measure of feature stability or reliability instead of classification accuracy, which is used in standard nCV. We use simulated data with main effects, correlation and interactions to compare the classification accuracy and feature selection performance of the new cnCV with standard nCV, Elastic Net optimized by cross-validation, differential privacy and private evaporative cooling (pEC). We also compare these methods using real RNA-seq data from a study of major depressive disorder.The cnCV method has similar training and validation accuracy to nCV, but cnCV has much shorter run times because it does not construct classifiers in the inner folds. The cnCV method chooses a more parsimonious set of features with fewer false positives than nCV. The cnCV method has similar accuracy to pEC and cnCV selects stable features between folds without the need to specify a privacy threshold. We show that cnCV is an effective and efficient approach for combining feature selection with classification. AVAILABILITY AND IMPLEMENTATION: Code available at https://github.com/insilico/cncv. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Transtorno Depressivo Maior , Consenso , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes , Projetos de Pesquisa
3.
Bioinformatics ; 35(13): 2329-2331, 2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30481259

RESUMO

MOTIVATION: An important challenge in gene expression analysis is to improve hub gene selection to enrich for biological relevance or improve classification accuracy for a given phenotype. In order to incorporate phenotypic context into co-expression, we recently developed an epistasis-expression network centrality method that blends the importance of gene-gene interactions (epistasis) and main effects of genes. Further blending of prior knowledge from functional interactions has the potential to enrich for relevant genes and stabilize classification. RESULTS: We develop two new expression-epistasis centrality methods that incorporate interaction prior knowledge. The first extends our SNPrank (EpistasisRank) method by incorporating a gene-wise prior knowledge vector. This prior knowledge vector informs the centrality algorithm of the inclination of a gene to be involved in interactions by incorporating functional interaction information from the Integrative Multi-species Prediction database. The second method extends Katz centrality to expression-epistasis networks (EpistasisKatz), extends the Katz bias to be a gene-wise vector of main effects and extends the Katz attenuation constant prefactor to be a prior-knowledge vector for interactions. Using independent microarray studies of major depressive disorder, we find that including prior knowledge in network centrality feature selection stabilizes the training classification and reduces over-fitting. AVAILABILITY AND IMPLEMENTATION: Methods and examples provided at https://github.com/insilico/Rinbix and https://github.com/insilico/PriorKnowledgeEpistasisRank. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Algoritmos , Transtorno Depressivo Maior , Epistasia Genética , Redes Reguladoras de Genes , Humanos , Fenótipo
4.
Bioinformatics ; 35(8): 1358-1365, 2019 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-30239600

RESUMO

MOTIVATION: Relief is a family of machine learning algorithms that uses nearest-neighbors to select features whose association with an outcome may be due to epistasis or statistical interactions with other features in high-dimensional data. Relief-based estimators are non-parametric in the statistical sense that they do not have a parameterized model with an underlying probability distribution for the estimator, making it difficult to determine the statistical significance of Relief-based attribute estimates. Thus, a statistical inferential formalism is needed to avoid imposing arbitrary thresholds to select the most important features. We reconceptualize the Relief-based feature selection algorithm to create a new family of STatistical Inference Relief (STIR) estimators that retains the ability to identify interactions while incorporating sample variance of the nearest neighbor distances into the attribute importance estimation. This variance permits the calculation of statistical significance of features and adjustment for multiple testing of Relief-based scores. Specifically, we develop a pseudo t-test version of Relief-based algorithms for case-control data. RESULTS: We demonstrate the statistical power and control of type I error of the STIR family of feature selection methods on a panel of simulated data that exhibits properties reflected in real gene expression data, including main effects and network interaction effects. We compare the performance of STIR when the adaptive radius method is used as the nearest neighbor constructor with STIR when the fixed-k nearest neighbor constructor is used. We apply STIR to real RNA-Seq data from a study of major depressive disorder and discuss STIR's straightforward extension to genome-wide association studies. AVAILABILITY AND IMPLEMENTATION: Code and data available at http://insilico.utulsa.edu/software/STIR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Software , Algoritmos , Análise por Conglomerados , Transtorno Depressivo Maior , Humanos , Aprendizado de Máquina , Modelos Estatísticos
5.
Bioinformatics ; 33(18): 2906-2913, 2017 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-28472232

RESUMO

MOTIVATION: Classification of individuals into disease or clinical categories from high-dimensional biological data with low prediction error is an important challenge of statistical learning in bioinformatics. Feature selection can improve classification accuracy but must be incorporated carefully into cross-validation to avoid overfitting. Recently, feature selection methods based on differential privacy, such as differentially private random forests and reusable holdout sets, have been proposed. However, for domains such as bioinformatics, where the number of features is much larger than the number of observations p≫n , these differential privacy methods are susceptible to overfitting. METHODS: We introduce private Evaporative Cooling, a stochastic privacy-preserving machine learning algorithm that uses Relief-F for feature selection and random forest for privacy preserving classification that also prevents overfitting. We relate the privacy-preserving threshold mechanism to a thermodynamic Maxwell-Boltzmann distribution, where the temperature represents the privacy threshold. We use the thermal statistical physics concept of Evaporative Cooling of atomic gases to perform backward stepwise privacy-preserving feature selection. RESULTS: On simulated data with main effects and statistical interactions, we compare accuracies on holdout and validation sets for three privacy-preserving methods: the reusable holdout, reusable holdout with random forest, and private Evaporative Cooling, which uses Relief-F feature selection and random forest classification. In simulations where interactions exist between attributes, private Evaporative Cooling provides higher classification accuracy without overfitting based on an independent validation set. In simulations without interactions, thresholdout with random forest and private Evaporative Cooling give comparable accuracies. We also apply these privacy methods to human brain resting-state fMRI data from a study of major depressive disorder. AVAILABILITY AND IMPLEMENTATION: Code available at http://insilico.utulsa.edu/software/privateEC . CONTACT: brett-mckinney@utulsa.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Modelos Biológicos , Privacidade , Classificação , Transtorno Depressivo Maior/classificação , Humanos , Software
6.
Brain Behav Immun ; 66: 193-200, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28645775

RESUMO

A subset of individuals with major depressive disorder (MDD) have impaired adaptive immunity characterized by a greater vulnerability to viral infection and a deficient response to vaccination along with a decrease in the number and/or activity of T cells and natural killer cells (NKC). Nevertheless, it remains unclear which specific subsets of lymphocytes are altered in MDD, a shortcoming we address here by utilizing an advanced fluorescence-activated cell sorting (FACS) method that allows for the differentiation of important functionally-distinct lymphocyte sub-populations. Furthermore, despite evidence that sleep disturbance, which is a core symptom of MDD, is itself associated with alterations in lymphocyte distributions, there is a paucity of studies examining the contribution of sleep disturbance on lymphocyte populations in MDD populations. Here, we measured differences in the percentages of 13 different lymphocytes and 6 different leukocytes in 54 unmedicated MDD patients (partially remitted to moderate) and 56 age and sex-matched healthy controls (HC). The relationship between self-reported sleep disturbance and cell counts was evaluated in the MDD group using the Pittsburgh Sleep Quality Index (PSQI). The MDD group showed a significantly increased percentage of CD127low/CCR4+ Treg cells, and memory Treg cells, as well as a reduction in CD56+CD16- (putative immunoregulatory) NKC counts, the latter, prior to correction for body mass index. There also was a trend for higher effector memory CD8+ cell counts in the MDD group versus the HC group. Further, within the MDD group, self-reported sleep disturbance was associated with an increased percentage of effector memory CD8+ cells but with a lower percentage of CD56+CD16- NKC. These results provide important new insights into the immune pathways involved in MDD, and provide novel evidence that MDD and associated sleep disturbance increase effector memory CD8+ and Treg pathways. Targeting sleep disturbance may have implications as a therapeutic strategy to normalize NKC and memory CD8+ cells in MDD.


Assuntos
Transtorno Depressivo Maior/imunologia , Células Matadoras Naturais/fisiologia , Transtornos do Sono-Vigília/imunologia , Linfócitos T Citotóxicos/fisiologia , Linfócitos T Reguladores/fisiologia , Adulto , Transtorno Depressivo Maior/complicações , Feminino , Citometria de Fluxo , Humanos , Masculino , Transtornos do Sono-Vigília/complicações
7.
Semin Immunol ; 25(2): 89-103, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23755893

RESUMO

Vaccines, like drugs and medical procedures, are increasingly amenable to individualization or personalization, often based on novel data resulting from high throughput "omics" technologies. As a result of these technologies, 21st century vaccinology will increasingly see the abandonment of a "one size fits all" approach to vaccine dosing and delivery, as well as the abandonment of the empiric "isolate-inactivate-inject" paradigm for vaccine development. In this review, we discuss the immune response network theory and its application to the new field of vaccinomics and adversomics, and illustrate how vaccinomics can lead to new vaccine candidates, new understandings of how vaccines stimulate immune responses, new biomarkers for vaccine response, and facilitate the understanding of what genetic and other factors might be responsible for rare side effects due to vaccines. Perhaps most exciting will be the ability, at a systems biology level, to integrate increasingly complex high throughput data into descriptive and predictive equations for immune responses to vaccines. Herein, we discuss the above with a view toward the future of vaccinology.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/prevenção & controle , Vacinas , Animais , Descoberta de Drogas/métodos , Descoberta de Drogas/tendências , Ensaios de Triagem em Larga Escala , Humanos , Medicina de Precisão , Biologia de Sistemas/tendências
8.
bioRxiv ; 2024 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-39282409

RESUMO

Recent associations between Major Depressive Disorder (MDD) and measures of premature aging suggest accelerated biological aging as a potential biomarker for MDD susceptibility or MDD as a risk factor for age-related diseases. Statistical and machine learning regression models of biological age have been trained on various sources of high dimensional data to predict chronological age. Residuals or "gaps" between the predicted biological age and chronological age have been used for statistical inference, such as testing whether an increased age gap is associated with a given disease state. Recently, a gene expression-based model of biological age showed a higher age gap for individuals with MDD compared to healthy controls (HC). In the current study, we propose a machine learning approach that simplifies gene selection by using a least absolute shrinkage and selection operator (LASSO) penalty to construct an expression-based Gene Age Gap Estimate (GAGE) model. We construct the LASSO-GAGE (L-GAGE) model in an RNA-Seq study of 78 unmedicated individuals with MDD and 79 HC and then test for accelerated biological aging in MDD. When testing L-GAGE association with MDD, we account for factors such as sex and chronological age to mitigate regression to the mean effects. The L-GAGE shows higher biological aging in MDD subjects than HC, but the elevation is not statistically significant. However, when we dichotomize chronological age, the interaction between MDD status and age is significant in L-GAGE model. This effect remains statistically significant even after adjusting for chronological age and sex. We find cytomegalovirus (CMV) serostatus is associated with elevated L-GAGE. We also investigate feature selection methods Random Forest and nearest neighbor projected distance regression (NPDR) to characterize age related genes, and we find functional enrichment of infectious disease and SARS-COV pathways.

9.
Transl Psychiatry ; 14(1): 199, 2024 Apr 27.
Artigo em Inglês | MEDLINE | ID: mdl-38678012

RESUMO

Major depressive disorder (MDD) is associated with interoceptive processing dysfunctions, but the molecular mechanisms underlying this dysfunction are poorly understood. This study combined brain neuronal-enriched extracellular vesicle (NEEV) technology and serum markers of inflammation and metabolism with Functional Magnetic Resonance Imaging (fMRI) to identify the contribution of gene regulatory pathways, in particular micro-RNA (miR) 93, to interoceptive dysfunction in MDD. Individuals with MDD (n = 41) and healthy comparisons (HC; n = 35) provided blood samples and completed an interoceptive attention task during fMRI. EVs were separated from plasma using a precipitation method. NEEVs were enriched by magnetic streptavidin bead immunocapture utilizing a neural adhesion marker (L1CAM/CD171) biotinylated antibody. The origin of NEEVs was validated with two other neuronal markers - neuronal cell adhesion molecule (NCAM) and ATPase Na+/K+ transporting subunit alpha 3 (ATP1A3). NEEV specificities were confirmed by flow cytometry, western blot, particle size analyzer, and transmission electron microscopy. NEEV small RNAs were purified and sequenced. Results showed that: (1) MDD exhibited lower NEEV miR-93 expression than HC; (2) within MDD but not HC, those individuals with the lowest NEEV miR-93 expression had the highest serum concentrations of interleukin (IL)-1 receptor antagonist, IL-6, tumor necrosis factor, and leptin; and (3) within HC but not MDD, those participants with the highest miR-93 expression showed the strongest bilateral dorsal mid-insula activation during interoceptive versus exteroceptive attention. Since miR-93 is regulated by stress and affects epigenetic modulation by chromatin re-organization, these results suggest that healthy individuals but not MDD participants show an adaptive epigenetic regulation of insular function during interoceptive processing. Future investigations will need to delineate how specific internal and external environmental conditions contribute to miR-93 expression in MDD and what molecular mechanisms alter brain responsivity to body-relevant signals.


Assuntos
Transtorno Depressivo Maior , Vesículas Extracelulares , Interocepção , MicroRNAs , Feminino , Humanos , Masculino , Encéfalo/metabolismo , Encéfalo/diagnóstico por imagem , Encéfalo/fisiopatologia , Estudos de Casos e Controles , Transtorno Depressivo Maior/metabolismo , Transtorno Depressivo Maior/fisiopatologia , Transtorno Depressivo Maior/genética , Vesículas Extracelulares/genética , Vesículas Extracelulares/metabolismo , Interocepção/fisiologia , Imageamento por Ressonância Magnética , MicroRNAs/genética , MicroRNAs/metabolismo , Neurônios/metabolismo
10.
Brain Behav Immun ; 31: 161-71, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23064081

RESUMO

Depressed patients show evidence of both proinflammatory changes and neurophysiological abnormalities such as increased amygdala reactivity and volumetric decreases of the hippocampus and ventromedial prefrontal cortex (vmPFC). However, very little is known about the relationship between inflammation and neuroimaging abnormalities in mood disorders. A whole genome expression analysis of peripheral blood mononuclear cells yielded 12 protein-coding genes (ADM, APBB3, CD160, CFD, CITED2, CTSZ, IER5, NFKBIZ, NR4A2, NUCKS1, SERTAD1, TNF) that were differentially expressed between 29 unmedicated depressed patients with a mood disorder (8 bipolar disorder, 21 major depressive disorder) and 24 healthy controls (HCs). Several of these genes have been implicated in neurological disorders and/or apoptosis. Ingenuity Pathway Analysis yielded two genes networks, one centered around TNF with NFKß, TGFß, and ERK as connecting hubs, and the second network indicating cell cycle and/or kinase signaling anomalies. fMRI scanning was conducted using a backward-masking task in which subjects were presented with emotionally-valenced faces. Compared with HCs, the depressed subjects displayed a greater hemodynamic response in the right amygdala, left hippocampus, and the ventromedial prefrontal cortex to masked sad versus happy faces. The mRNA levels of several genes were significantly correlated with the hemodynamic response of the amygdala, vmPFC and hippocampus to masked sad versus happy faces. Differentially-expressed transcripts were significantly correlated with thickness of the left subgenual ACC, and volume of the hippocampus and caudate. Our results raise the possibility that molecular-level immune dysfunction can be mapped onto macro-level neuroimaging abnormalities, potentially elucidating a mechanism by which inflammation leads to depression.


Assuntos
Transtorno Bipolar/genética , Encéfalo/patologia , Transtorno Depressivo/genética , Emoções/fisiologia , Inflamação/genética , Adulto , Transtorno Bipolar/patologia , Transtorno Bipolar/fisiopatologia , Encéfalo/fisiopatologia , Transtorno Depressivo/patologia , Transtorno Depressivo/fisiopatologia , Expressão Facial , Feminino , Neuroimagem Funcional , Perfilação da Expressão Gênica , Humanos , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Masculino , Pessoa de Meia-Idade , Tamanho do Órgão , Estimulação Luminosa
11.
PLoS Genet ; 5(3): e1000432, 2009 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19300503

RESUMO

Evidence from human genetic studies of several disorders suggests that interactions between alleles at multiple genes play an important role in influencing phenotypic expression. Analytical methods for identifying Mendelian disease genes are not appropriate when applied to common multigenic diseases, because such methods investigate association with the phenotype only one genetic locus at a time. New strategies are needed that can capture the spectrum of genetic effects, from Mendelian to multifactorial epistasis. Random Forests (RF) and Relief-F are two powerful machine-learning methods that have been studied as filters for genetic case-control data due to their ability to account for the context of alleles at multiple genes when scoring the relevance of individual genetic variants to the phenotype. However, when variants interact strongly, the independence assumption of RF in the tree node-splitting criterion leads to diminished importance scores for relevant variants. Relief-F, on the other hand, was designed to detect strong interactions but is sensitive to large backgrounds of variants that are irrelevant to classification of the phenotype, which is an acute problem in genome-wide association studies. To overcome the weaknesses of these data mining approaches, we develop Evaporative Cooling (EC) feature selection, a flexible machine learning method that can integrate multiple importance scores while removing irrelevant genetic variants. To characterize detailed interactions, we construct a genetic-association interaction network (GAIN), whose edges quantify the synergy between variants with respect to the phenotype. We use simulation analysis to show that EC is able to identify a wide range of interaction effects in genetic association data. We apply the EC filter to a smallpox vaccine cohort study of single nucleotide polymorphisms (SNPs) and infer a GAIN for a collection of SNPs associated with adverse events. Our results suggest an important role for hubs in SNP disease susceptibility networks. The software is available at (http://sites.google.com/site/McKinneyLab/software).


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Variação Genética , Alelos , Simulação por Computador , Genes , Predisposição Genética para Doença , Humanos , Internet , Fenótipo , Vacina Antivariólica/efeitos adversos
12.
Brain Behav Immun Health ; 26: 100534, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36247836

RESUMO

The identification of gene expression-based biomarkers for major depressive disorder (MDD) continues to be an important challenge. In order to identify candidate biomarkers and mechanisms, we apply statistical and machine learning feature selection to an RNA-Seq gene expression dataset of 78 unmedicated individuals with MDD and 79 healthy controls. We identify 49 genes by LASSO penalized logistic regression and 45 genes at the false discovery rate threshold 0.188. The MDGA1 gene has the lowest P-value (4.9e-5) and is expressed in the developing brain, involved in axon guidance, and associated with related mood disorders in previous studies of bipolar disorder (BD) and schizophrenia (SCZ). The expression of MDGA1 is associated with age and sex, but its association with MDD remains significant when adjusted for covariates. MDGA1 is in a co-expression cluster with another top gene, ATXN7L2 (ataxin 7 like 2), which was associated with MDD in a recent GWAS. The LASSO classification model of MDD includes MDGA1, and the model has a cross-validation accuracy of 79%. Another noteworthy top gene, IRF2BPL, is in a close co-expression cluster with MDGA1 and may be related to microglial inflammatory states in MDD. Future exploration of MDGA1 and its gene interactions may provide insights into mechanisms and heterogeneity of MDD.

13.
PLoS One ; 16(2): e0246761, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33556091

RESUMO

The performance of nearest-neighbor feature selection and prediction methods depends on the metric for computing neighborhoods and the distribution properties of the underlying data. Recent work to improve nearest-neighbor feature selection algorithms has focused on new neighborhood estimation methods and distance metrics. However, little attention has been given to the distributional properties of pairwise distances as a function of the metric or data type. Thus, we derive general analytical expressions for the mean and variance of pairwise distances for Lq metrics for normal and uniform random data with p attributes and m instances. The distribution moment formulas and detailed derivations provide a resource for understanding the distance properties for metrics and data types commonly used with nearest-neighbor methods, and the derivations provide the starting point for the following novel results. We use extreme value theory to derive the mean and variance for metrics that are normalized by the range of each attribute (difference of max and min). We derive analytical formulas for a new metric for genetic variants, which are categorical variables that occur in genome-wide association studies (GWAS). The genetic distance distributions account for minor allele frequency and the transition/transversion ratio. We introduce a new metric for resting-state functional MRI data (rs-fMRI) and derive its distance distribution properties. This metric is applicable to correlation-based predictors derived from time-series data. The analytical means and variances are in strong agreement with simulation results. We also use simulations to explore the sensitivity of the expected means and variances in the presence of correlation and interactions in the data. These analytical results and new metrics can be used to inform the optimization of nearest neighbor methods for a broad range of studies, including gene expression, GWAS, and fMRI data.


Assuntos
Algoritmos , Regulação da Expressão Gênica , Modelos Genéticos , Análise por Conglomerados , Estudo de Associação Genômica Ampla , Humanos
14.
Front Psychiatry ; 12: 682495, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34220587

RESUMO

Neuroscience studies require considerable bioinformatic support and expertise. Numerous high-dimensional and multimodal datasets must be preprocessed and integrated to create robust and reproducible analysis pipelines. We describe a common data elements and scalable data management infrastructure that allows multiple analytics workflows to facilitate preprocessing, analysis and sharing of large-scale multi-level data. The process uses the Brain Imaging Data Structure (BIDS) format and supports MRI, fMRI, EEG, clinical, and laboratory data. The infrastructure provides support for other datasets such as Fitbit and flexibility for developers to customize the integration of new types of data. Exemplar results from 200+ participants and 11 different pipelines demonstrate the utility of the infrastructure.

15.
Chaos ; 20(2): 026103, 2010 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-20590332

RESUMO

Interactions between genetic and/or environmental factors are ubiquitous, affecting the phenotypes of organisms in complex ways. Knowledge about such interactions is becoming rate-limiting for our understanding of human disease and other biological phenomena. Phenomics refers to the integrative analysis of how all genes contribute to phenotype variation, entailing genome and organism level information. A systems biology view of gene interactions is critical for phenomics. Unfortunately the problem is intractable in humans; however, it can be addressed in simpler genetic model systems. Our research group has focused on the concept of genetic buffering of phenotypic variation, in studies employing the single-cell eukaryotic organism, S. cerevisiae. We have developed a methodology, quantitative high throughput cellular phenotyping (Q-HTCP), for high-resolution measurements of gene-gene and gene-environment interactions on a genome-wide scale. Q-HTCP is being applied to the complete set of S. cerevisiae gene deletion strains, a unique resource for systematically mapping gene interactions. Genetic buffering is the idea that comprehensive and quantitative knowledge about how genes interact with respect to phenotypes will lead to an appreciation of how genes and pathways are functionally connected at a systems level to maintain homeostasis. However, extracting biologically useful information from Q-HTCP data is challenging, due to the multidimensional and nonlinear nature of gene interactions, together with a relative lack of prior biological information. Here we describe a new approach for mining quantitative genetic interaction data called recursive expectation-maximization clustering (REMc). We developed REMc to help discover phenomic modules, defined as sets of genes with similar patterns of interaction across a series of genetic or environmental perturbations. Such modules are reflective of buffering mechanisms, i.e., genes that play a related role in the maintenance of physiological homeostasis. To develop the method, 297 gene deletion strains were selected based on gene-drug interactions with hydroxyurea, an inhibitor of ribonucleotide reductase enzyme activity, which is critical for DNA synthesis. To partition the gene functions, these 297 deletion strains were challenged with growth inhibitory drugs known to target different genes and cellular pathways. Q-HTCP-derived growth curves were used to quantify all gene interactions, and the data were used to test the performance of REMc. Fundamental advantages of REMc include objective assessment of total number of clusters and assignment to each cluster a log-likelihood value, which can be considered an indicator of statistical quality of clusters. To assess the biological quality of clusters, we developed a method called gene ontology information divergence z-score (GOid_z). GOid_z summarizes total enrichment of GO attributes within individual clusters. Using these and other criteria, we compared the performance of REMc to hierarchical and K-means clustering. The main conclusion is that REMc provides distinct efficiencies for mining Q-HTCP data. It facilitates identification of phenomic modules, which contribute to buffering mechanisms that underlie cellular homeostasis and the regulation of phenotypic expression.


Assuntos
Análise por Conglomerados , Epistasia Genética , Modelos Genéticos , Mineração de Dados , Deleção de Genes , Redes Reguladoras de Genes , Genes Fúngicos , Estudos de Associação Genética , Humanos , Funções Verossimilhança , Dinâmica não Linear , Saccharomyces cerevisiae/genética
16.
Front Genet ; 11: 784, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32774345

RESUMO

Nearest-neighbor Projected-Distance Regression (NPDR) is a feature selection technique that uses nearest-neighbors in high dimensional data to detect complex multivariate effects including epistasis. NPDR uses a regression formalism that allows statistical significance testing and efficient control for multiple testing. In addition, the regression formalism provides a mechanism for NPDR to adjust for population structure, which we apply to a GWAS of systemic lupus erythematosus (SLE). We also test NPDR on benchmark simulated genetic variant data with epistatic effects, main effects, imbalanced data for case-control design and continuous outcomes. NPDR identifies potential interactions in an epistasis network that influences the SLE disorder.

17.
PLoS One ; 15(1): e0228412, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31978140

RESUMO

[This corrects the article DOI: 10.1371/journal.pone.0100839.].

18.
Transl Psychiatry ; 10(1): 282, 2020 08 12.
Artigo em Inglês | MEDLINE | ID: mdl-32788574

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

19.
J Virol ; 82(20): 10271-8, 2008 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-18667496

RESUMO

While human immunodeficiency virus type 1 (HIV-1) infection is associated with hyperimmune activation and systemic depletion of CD4(+) T cells, simian immunodeficiency virus (SIV) infection in sooty mangabeys or chimpanzees does not exhibit these hallmarks. Control of immune activation is thought to be one of the major components that govern species-dependent differences in the disease pathogenesis. A previous study introduced the idea that the resistance of chimpanzees to SIVcpz infection-induced hyperimmune activation could be the result of the expression of select sialic acid-recognizing immunoglobulin (Ig)-like lectin (Siglec) superfamily members by chimpanzee T cells. Siglecs, which are absent on human T cells, were thought to control levels of T-cell activation in chimpanzees and were thus suggested as a cause for the pathogenic differences in the course of SIVcpz or HIV-1 infection. As in human models of T-cell activation, stimulation had been attempted using an anti-CD3 monoclonal antibody (MAb) (UCHT1; isotype IgG1), but despite efficient binding, UCHT1 failed to activate chimpanzee T cells, an activation block that could be partially overcome by MAb-induced Siglec-5 internalization. We herein demonstrate that anti-CD3 MAb-mediated chimpanzee T-cell activation is a function of the anti-CD3 MAb isotype and is not governed by Siglec expression. While IgG1 anti-CD3 MAbs fail to stimulate chimpanzee T cells, IgG2a anti-CD3 MAbs activate chimpanzee T cells in the absence of Siglec manipulations. Our results thus imply that prior to studying possible differences between human and chimpanzee T-cell activation, a relevant model of chimpanzee T cell activation needs to be established.


Assuntos
Complexo CD3/imunologia , HIV/imunologia , Ativação Linfocitária/imunologia , Pan troglodytes/imunologia , Receptores de Antígenos de Linfócitos T/imunologia , Vírus da Imunodeficiência Símia/imunologia , Linfócitos T/imunologia , Sequência de Aminoácidos , Animais , Anticorpos Monoclonais/imunologia , Complexo CD3/genética , Suscetibilidade a Doenças/imunologia , Suscetibilidade a Doenças/virologia , Infecções por HIV/imunologia , Humanos , Lectinas/genética , Lectinas/imunologia , Dados de Sequência Molecular , Estrutura Molecular , Alinhamento de Sequência , Lectinas Semelhantes a Imunoglobulina de Ligação ao Ácido Siálico , Síndrome de Imunodeficiência Adquirida dos Símios/imunologia
20.
Expert Rev Vaccines ; 18(3): 253-267, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30700167

RESUMO

INTRODUCTION: Emerging infectious diseases are a major threat to public health, and while vaccines have proven to be one of the most effective preventive measures for infectious diseases, we still do not have safe and effective vaccines against many human pathogens, and emerging diseases continually pose new threats. The purpose of this review is to discuss how the creation of vaccines for these new threats has been hindered by limitations in the current approach to vaccine development. Recent advances in high-throughput technologies have enabled scientists to apply systems biology approaches to collect and integrate increasingly large datasets that capture comprehensive biological changes induced by vaccines, and then decipher the complex immune response to those vaccines. AREAS COVERED: This review covers advances in these technologies and recent publications that describe systems biology approaches to understanding vaccine immune responses and to understanding the rational design of new vaccine candidates. EXPERT OPINION: Systems biology approaches to vaccine development provide novel information regarding both the immune response and the underlying mechanisms and can inform vaccine development.


Assuntos
Doenças Transmissíveis Emergentes/prevenção & controle , Biologia de Sistemas/métodos , Vacinas/administração & dosagem , Animais , Doenças Transmissíveis Emergentes/imunologia , Desenvolvimento de Medicamentos/métodos , Ensaios de Triagem em Larga Escala/métodos , Humanos , Saúde Pública , Vacinas/efeitos adversos , Vacinas/imunologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA