Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 85
Filtrar
1.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36653900

RESUMO

Microbial communities are highly dynamic and sensitive to changes in the environment. Thus, microbiome data are highly susceptible to batch effects, defined as sources of unwanted variation that are not related to and obscure any factors of interest. Existing batch effect correction methods have been primarily developed for gene expression data. As such, they do not consider the inherent characteristics of microbiome data, including zero inflation, overdispersion and correlation between variables. We introduce new multivariate and non-parametric batch effect correction methods based on Partial Least Squares Discriminant Analysis (PLSDA). PLSDA-batch first estimates treatment and batch variation with latent components, then subtracts batch-associated components from the data. The resulting batch-effect-corrected data can then be input in any downstream statistical analysis. Two variants are proposed to handle unbalanced batch x treatment designs and to avoid overfitting when estimating the components via variable selection. We compare our approaches with popular methods managing batch effects, namely, removeBatchEffect, ComBat and Surrogate Variable Analysis, in simulated and three case studies using various visual and numerical assessments. We show that our three methods lead to competitive performance in removing batch variation while preserving treatment variation, especially for unbalanced batch $\times $ treatment designs. Our downstream analyses show selections of biologically relevant taxa. This work demonstrates that batch effect correction methods can improve microbiome research outputs. Reproducible code and vignettes are available on GitHub.


Assuntos
Microbiota , Projetos de Pesquisa , Análise dos Mínimos Quadrados , Análise Discriminante
2.
Brief Bioinform ; 23(3)2022 05 13.
Artigo em Inglês | MEDLINE | ID: mdl-35362513

RESUMO

Characterizing the molecular identity of a cell is an essential step in single-cell RNA sequencing (scRNA-seq) data analysis. Numerous tools exist for predicting cell identity using single-cell reference atlases. However, many challenges remain, including correcting for inherent batch effects between reference and query data andinsufficient phenotype data from the reference. One solution is to project single-cell data onto established bulk reference atlases to leverage their rich phenotype information. Sincast is a computational framework to query scRNA-seq data by projection onto bulk reference atlases. Prior to projection, single-cell data are transformed to be directly comparable to bulk data, either with pseudo-bulk aggregation or graph-based imputation to address sparse single-cell expression profiles. Sincast avoids batch effect correction, and cell identity is predicted along a continuum to highlight new cell states not found in the reference atlas. In several case study scenarios, we show that Sincast projects single cells into the correct biological niches in the expression space of the bulk reference atlas. We demonstrate the effectiveness of our imputation approach that was specifically developed for querying scRNA-seq data based on bulk reference atlases. We show that Sincast is an efficient and powerful tool for single-cell profiling that will facilitate downstream analysis of scRNA-seq data.


Assuntos
Análise de Célula Única , Transcriptoma , Análise de Dados , Perfilação da Expressão Gênica , Fenótipo , Análise de Sequência de RNA , Sequenciamento do Exoma
3.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35830875

RESUMO

The microbiome is a complex and dynamic community of microorganisms that co-exist interdependently within an ecosystem, and interact with its host or environment. Longitudinal studies can capture temporal variation within the microbiome to gain mechanistic insights into microbial systems; however, current statistical methods are limited due to the complex and inherent features of the data. We have identified three analytical objectives in longitudinal microbial studies: (1) differential abundance over time and between sample groups, demographic factors or clinical variables of interest; (2) clustering of microorganisms evolving concomitantly across time and (3) network modelling to identify temporal relationships between microorganisms. This review explores the strengths and limitations of current methods to fulfill these objectives, compares different methods in simulation and case studies for objectives (1) and (2), and highlights opportunities for further methodological developments. R tutorials are provided to reproduce the analyses conducted in this review.


Assuntos
Análise de Dados , Microbiota , Análise por Conglomerados , Estudos Longitudinais , RNA Ribossômico 16S
4.
PLoS Biol ; 19(10): e3001419, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34618807

RESUMO

Evolving in sync with the computation revolution over the past 30 years, computational biology has emerged as a mature scientific field. While the field has made major contributions toward improving scientific knowledge and human health, individual computational biology practitioners at various institutions often languish in career development. As optimistic biologists passionate about the future of our field, we propose solutions for both eager and reluctant individual scientists, institutions, publishers, funding agencies, and educators to fully embrace computational biology. We believe that in order to pave the way for the next generation of discoveries, we need to improve recognition for computational biologists and better align pathways of career success with pathways of scientific progress. With 10 outlined steps, we call on all adjacent fields to move away from the traditional individual, single-discipline investigator research model and embrace multidisciplinary, data-driven, team science.


Assuntos
Biologia Computacional , Orçamentos , Comportamento Cooperativo , Humanos , Pesquisa Interdisciplinar , Tutoria , Motivação , Publicações , Recompensa , Software
5.
Nucleic Acids Res ; 50(5): e27, 2022 03 21.
Artigo em Inglês | MEDLINE | ID: mdl-34883510

RESUMO

Multi-omics integration is key to fully understand complex biological processes in an holistic manner. Furthermore, multi-omics combined with new longitudinal experimental design can unreveal dynamic relationships between omics layers and identify key players or interactions in system development or complex phenotypes. However, integration methods have to address various experimental designs and do not guarantee interpretable biological results. The new challenge of multi-omics integration is to solve interpretation and unlock the hidden knowledge within the multi-omics data. In this paper, we go beyond integration and propose a generic approach to face the interpretation problem. From multi-omics longitudinal data, this approach builds and explores hybrid multi-omics networks composed of both inferred and known relationships within and between omics layers. With smart node labelling and propagation analysis, this approach predicts regulation mechanisms and multi-omics functional modules. We applied the method on 3 case studies with various multi-omics designs and identified new multi-layer interactions involved in key biological functions that could not be revealed with single omics analysis. Moreover, we highlighted interplay in the kinetics that could help identify novel biological mechanisms. This method is available as an R package netOmics to readily suit any application.


Assuntos
Genômica , Biologia de Sistemas/métodos , Genômica/métodos , Fenótipo
6.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34036326

RESUMO

Despite the volume of experiments performed and data available, the complex biology of coronavirus SARS-COV-2 is not yet fully understood. Existing molecular profiling studies have focused on analysing functional omics data of a single type, which captures changes in a small subset of the molecular perturbations caused by the virus. As the logical next step, results from multiple such omics analysis may be aggregated to comprehensively interpret the molecular mechanisms of SARS-CoV-2. An alternative approach is to integrate data simultaneously in a parallel fashion to highlight the inter-relationships of disease-driving biomolecules, in contrast to comparing processed information from each omics level separately. We demonstrate that valuable information may be masked by using the former fragmented views in analysis, and biomarkers resulting from such an approach cannot provide a systematic understanding of the disease aetiology. Hence, we present a generic, reproducible and flexible open-access data harmonisation framework that can be scaled out to future multi-omics analysis to study a phenotype in a holistic manner. The pipeline source code, detailed documentation and automated version as a R package are accessible. To demonstrate the effectiveness of our pipeline, we applied it to a drug screening task. We integrated multi-omics data to find the lowest level of statistical associations between data features in two case studies. Strongly correlated features within each of these two datasets were used for drug-target analysis, resulting in a list of 84 drug-target candidates. Further computational docking and toxicity analyses revealed seven high-confidence targets, amsacrine, bosutinib, ceritinib, crizotinib, nintedanib and sunitinib as potential starting points for drug therapy and development.


Assuntos
Tratamento Farmacológico da COVID-19 , Genômica , Terapia de Alvo Molecular , SARS-CoV-2/efeitos dos fármacos , Algoritmos , Biomarcadores/química , COVID-19/genética , COVID-19/patologia , COVID-19/virologia , Biologia Computacional , Bases de Dados Genéticas , Humanos , SARS-CoV-2/química , SARS-CoV-2/genética , Software
7.
Bioinformatics ; 38(2): 577-579, 2022 01 03.
Artigo em Inglês | MEDLINE | ID: mdl-34554215

RESUMO

MOTIVATION: Multi-omics data integration enables the global analysis of biological systems and discovery of new biological insights. Multi-omics experimental designs have been further extended with a longitudinal dimension to study dynamic relationships between molecules. However, methods that integrate longitudinal multi-omics data are still in their infancy. RESULTS: We introduce the R package timeOmics, a generic analytical framework for the integration of longitudinal multi-omics data. The framework includes pre-processing, modeling and clustering to identify molecular features strongly associated with time. We illustrate this framework in a case study to detect seasonal patterns of mRNA, metabolites, gut taxa and clinical variables in patients with diabetes mellitus from the integrative Human Microbiome Project. AVAILABILITYAND IMPLEMENTATION: timeOmics is available on Bioconductor and github.com/abodein/timeOmics. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Multiômica , Humanos , Genômica/métodos , Análise por Conglomerados
8.
PLoS Genet ; 16(8): e1008906, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32804949

RESUMO

The killer immunoglobulin-like receptors (KIRs), found predominantly on the surface of natural killer (NK) cells and some T-cells, are a collection of highly polymorphic activating and inhibitory receptors with variable specificity for class I human leukocyte antigen (HLA) ligands. Fifteen KIR genes are inherited in haplotypes of diverse gene content across the human population, and the repertoire of independently inherited KIR and HLA alleles is known to alter risk for immune-mediated and infectious disease by shifting the threshold of lymphocyte activation. We have conducted the largest disease-association study of KIR-HLA epistasis to date, enabled by the imputation of KIR gene and HLA allele dosages from genotype data for 12,214 healthy controls and 8,107 individuals with the HLA-B*27-associated immune-mediated arthritis, ankylosing spondylitis (AS). We identified epistatic interactions between KIR genes and their ligands (at both HLA subtype and allele resolution) that increase risk of disease, replicating analyses in a semi-independent cohort of 3,497 cases and 14,844 controls. We further confirmed that the strong AS-association with a pathogenic variant in the endoplasmic reticulum aminopeptidase gene ERAP1, known to alter the HLA-B*27 presented peptidome, is not modified by carriage of the canonical HLA-B receptor KIR3DL1/S1. Overall, our data suggests that AS risk is modified by the complement of KIRs and HLA ligands inherited, beyond the influence of HLA-B*27 alone, which collectively alter the proinflammatory capacity of KIR-expressing lymphocytes to contribute to disease immunopathogenesis.


Assuntos
Epistasia Genética , Antígenos HLA/genética , Receptores KIR/genética , Espondilite Anquilosante/genética , Alelos , Aminopeptidases/genética , Humanos , Antígenos de Histocompatibilidade Menor/genética , Polimorfismo de Nucleotídeo Único
9.
Nat Methods ; 16(6): 479-487, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31133762

RESUMO

Single cell RNA-sequencing (scRNA-seq) technology has undergone rapid development in recent years, leading to an explosion in the number of tailored data analysis methods. However, the current lack of gold-standard benchmark datasets makes it difficult for researchers to systematically compare the performance of the many methods available. Here, we generated a realistic benchmark experiment that included single cells and admixtures of cells or RNA to create 'pseudo cells' from up to five distinct cancer cell lines. In total, 14 datasets were generated using both droplet and plate-based scRNA-seq protocols. We compared 3,913 combinations of data analysis methods for tasks ranging from normalization and imputation to clustering, trajectory analysis and data integration. Evaluation revealed pipelines suited to different types of data for different tasks. Our data and analysis provide a comprehensive framework for benchmarking most common scRNA-seq analysis steps.


Assuntos
Adenocarcinoma/genética , Benchmarking , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias Pulmonares/genética , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Humanos , Software , Células Tumorais Cultivadas
10.
Ann Rheum Dis ; 80(5): 573-581, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33397732

RESUMO

OBJECTIVES: Analysis of oral dysbiosis in individuals sharing genetic and environmental risk factors with rheumatoid arthritis (RA) patients may illuminate how microbiota contribute to disease susceptibility. We studied the oral microbiota in a prospective cohort of patients with RA, first-degree relatives (FDR) and healthy controls (HC), then genomically and functionally characterised streptococcal species from each group to understand their potential contribution to RA development. METHODS: After DNA extraction from tongue swabs, targeted 16S rRNA gene sequencing and statistical analysis, we defined a microbial dysbiosis score based on an operational taxonomic unit signature of disease. After selective culture from swabs, we identified streptococci by sequencing. We examined the ability of streptococcal cell walls (SCW) from isolates to induce cytokines from splenocytes and arthritis in ZAP-70-mutant SKG mice. RESULTS: RA and FDR were more likely to have periodontitis symptoms. An oral microbial dysbiosis score discriminated RA and HC subjects and predicted similarity of FDR to RA. Streptococcaceae were major contributors to the score. We identified 10 out of 15 streptococcal isolates as S. parasalivarius sp. nov., a distinct sister species to S. salivarius. Tumour necrosis factor and interleukin 6 production in vitro differed in response to individual S. parasalivarius isolates, suggesting strain specific effects on innate immunity. Cytokine secretion was associated with the presence of proteins potentially involved in S. parasalivarius SCW synthesis. Systemic administration of SCW from RA and HC-associated S. parasalivarius strains induced similar chronic arthritis. CONCLUSIONS: Dysbiosis-associated periodontal inflammation and barrier dysfunction may permit arthritogenic insoluble pro-inflammatory pathogen-associated molecules, like SCW, to reach synovial tissue.


Assuntos
Artrite Reumatoide/microbiologia , Biopolímeros/isolamento & purificação , Disbiose/microbiologia , Peptidoglicano/isolamento & purificação , Periodontite/microbiologia , Streptococcus/isolamento & purificação , Adulto , Animais , Suscetibilidade a Doenças/microbiologia , Feminino , Humanos , Masculino , Camundongos , Microbiota , Pessoa de Meia-Idade , Boca/microbiologia , Linhagem , RNA Ribossômico 16S
11.
PLoS Comput Biol ; 16(9): e1008219, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32986694

RESUMO

Gene expression atlases have transformed our understanding of the development, composition and function of human tissues. New technologies promise improved cellular or molecular resolution, and have led to the identification of new cell types, or better defined cell states. But as new technologies emerge, information derived on old platforms becomes obsolete. We demonstrate that it is possible to combine a large number of different profiling experiments summarised from dozens of laboratories and representing hundreds of donors, to create an integrated molecular map of human tissue. As an example, we combine 850 samples from 38 platforms to build an integrated atlas of human blood cells. We achieve robust and unbiased cell type clustering using a variance partitioning method, selecting genes with low platform bias relative to biological variation. Other than an initial rescaling, no other transformation to the primary data is applied through batch correction or renormalisation. Additional data, including single-cell datasets, can be projected for comparison, classification and annotation. The resulting atlas provides a multi-scaled approach to visualise and analyse the relationships between sets of genes and blood cell lineages, including the maturation and activation of leukocytes in vivo and in vitro. In allowing for data integration across hundreds of studies, we address a key reproduciblity challenge which is faced by any new technology. This allows us to draw on the deep phenotypes and functional annotations that accompany traditional profiling methods, and provide important context to the high cellular resolution of single cell profiling. Here, we have implemented the blood atlas in the open access Stemformatics.org platform, drawing on its extensive collection of curated transcriptome data. The method is simple, scalable and amenable for rapid deployment in other biological systems or computational workflows.


Assuntos
Transcriptoma , Análise por Conglomerados , Curadoria de Dados , Perfilação da Expressão Gênica , Humanos
12.
Bioinformatics ; 35(17): 3055-3062, 2019 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30657866

RESUMO

MOTIVATION: In the continuously expanding omics era, novel computational and statistical strategies are needed for data integration and identification of biomarkers and molecular signatures. We present Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO), a multi-omics integrative method that seeks for common information across different data types through the selection of a subset of molecular features, while discriminating between multiple phenotypic groups. RESULTS: Using simulations and benchmark multi-omics studies, we show that DIABLO identifies features with superior biological relevance compared with existing unsupervised integrative methods, while achieving predictive performance comparable to state-of-the-art supervised approaches. DIABLO is versatile, allowing for modular-based analyses and cross-over study designs. In two case studies, DIABLO identified both known and novel multi-omics biomarkers consisting of mRNAs, miRNAs, CpGs, proteins and metabolites. AVAILABILITY AND IMPLEMENTATION: DIABLO is implemented in the mixOmics R Bioconductor package with functions for parameters' choice and visualization to assist in the interpretation of the integrative analyses, along with tutorials on http://mixomics.org and in our Bioconductor vignette. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Biomarcadores , Estudos Cross-Over , Genômica , MicroRNAs
13.
Nature ; 516(7530): 198-206, 2014 Dec 11.
Artigo em Inglês | MEDLINE | ID: mdl-25503233

RESUMO

Somatic cell reprogramming to a pluripotent state continues to challenge many of our assumptions about cellular specification, and despite major efforts, we lack a complete molecular characterization of the reprograming process. To address this gap in knowledge, we generated extensive transcriptomic, epigenomic and proteomic data sets describing the reprogramming routes leading from mouse embryonic fibroblasts to induced pluripotency. Through integrative analysis, we reveal that cells transition through distinct gene expression and epigenetic signatures and bifurcate towards reprogramming transgene-dependent and -independent stable pluripotent states. Early transcriptional events, driven by high levels of reprogramming transcription factor expression, are associated with widespread loss of histone H3 lysine 27 (H3K27me3) trimethylation, representing a general opening of the chromatin state. Maintenance of high transgene levels leads to re-acquisition of H3K27me3 and a stable pluripotent state that is alternative to the embryonic stem cell (ESC)-like fate. Lowering transgene levels at an intermediate phase, however, guides the process to the acquisition of ESC-like chromatin and DNA methylation signature. Our data provide a comprehensive molecular description of the reprogramming routes and is accessible through the Project Grandiose portal at http://www.stemformatics.org.


Assuntos
Reprogramação Celular/genética , Genoma/genética , Células-Tronco Pluripotentes Induzidas/citologia , Células-Tronco Pluripotentes Induzidas/metabolismo , Animais , Cromatina/química , Cromatina/genética , Cromatina/metabolismo , Montagem e Desmontagem da Cromatina , Metilação de DNA , Células-Tronco Embrionárias/citologia , Células-Tronco Embrionárias/metabolismo , Epistasia Genética/genética , Fibroblastos/citologia , Fibroblastos/metabolismo , Histonas/química , Histonas/metabolismo , Internet , Camundongos , Proteoma/genética , Proteômica , RNA Longo não Codificante/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Transcrição Gênica/genética , Transcriptoma/genética , Transgenes/genética
14.
Ann Rheum Dis ; 78(4): 494-503, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30700427

RESUMO

OBJECTIVES: Certain gut bacterial families, including Bacteroidaceae, Porphyromonadaceae and Prevotellaceae, are increased in people suffering from spondyloarthropathy (SpA), a disease group associated with IL23R signalling variants. To understand the relationship between host interleukin (IL)-23 signalling and gut bacterial dysbiosis in SpA, we inhibited IL-23 in dysbiotic ZAP-70-mutant SKG mice that develop IL-23-dependent SpA-like arthritis, psoriasis-like skin inflammation and Crohn's-like ileitis in response to microbial beta 1,3-glucan (curdlan). METHODS: We treated SKG mice weekly with anti-IL-23 or isotype mAb for 3 weeks, rested them for 3 weeks, then administered curdlan or saline. We collected faecal samples longitudinally, assessed arthritis, spondylitis, psoriasis and ileitis histologically, and analysed the microbiota community profiles using next-generation sequencing. We used multivariate sparse partial least squares discriminant analysis to identify operational taxonomic unit (OTU) signatures best classifying treatment groups and linear regression to develop a predictive model of disease severity. RESULTS: IL-23p19 inhibition in naïve SKG mice decreased Bacteroidaceae, Porphyromonadaceae and Prevotellaceae. Abundance of Clostridiaceae and Lachnospiraceae families concomitantly increased, and curdlan-mediated SpA development decreased. Abundance of Enterobacteriaceae and Porphyromonadaceae family and reduction in Lachnospiraceae Dorea genus OTUs early in disease course were associated with disease severity in affected tissues. CONCLUSIONS: Dysbiosis in SKG mice reflects human SpA and is IL-23p19 dependent. In genetically susceptible hosts, IL-23p19 favours outgrowth of SpA-associated pathobionts and reduces support for homeostatic-inducing microbiota. The relative abundance of specific pathobionts is associated with disease severity.


Assuntos
Bactérias/crescimento & desenvolvimento , Disbiose/microbiologia , Microbioma Gastrointestinal/imunologia , Subunidade p19 da Interleucina-23/imunologia , Espondilartrite/microbiologia , Animais , Disbiose/imunologia , Fezes/microbiologia , Feminino , Homeostase/imunologia , Interações Hospedeiro-Patógeno/imunologia , Subunidade p19 da Interleucina-23/antagonistas & inibidores , Camundongos Mutantes , Índice de Gravidade de Doença , Espondilartrite/induzido quimicamente , Espondilartrite/imunologia , beta-Glucanas
15.
Pediatr Diabetes ; 20(2): 166-171, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30556344

RESUMO

BACKGROUND: Stimulated C-peptide measurement after a mixed meal tolerance test (MMTT) is the accepted gold standard for assessing residual beta-cell function in type 1 diabetes (T1D); however, this approach is impractical outside of clinical trials. OBJECTIVE: To develop an improved estimate of residual beta-cell function in children with T1D using commonly measured clinical variables. SUBJECTS/METHODS: A clinical model to predict 90-minute MMTT stimulated C-peptide in children with recent-onset T1D was developed from the combined AbATE, START, and TIDAL placebo subjects (n = 46) 6 months post-recruitment using multiple linear regression. This model was then validated in a clinical cohort (Hvidoere study group, n = 262). RESULTS: A model of estimated C-peptide at 6 months post-diagnosis, which included age, gender, body mass index (BMI), hemoglobin A1c (HbA1c), and insulin dose predicted 90-minute stimulated C-peptide measurements (adjusted R2 = 0.63, P < 0.0001). The predictive value of insulin dose and HbA1c alone (IDAA1c) for 90-minute stimulated C-peptide was significantly lower (R2 = 0.37, P < 0.0001). The slopes of linear regression lines of the estimated and stimulated 90-minute C-peptide levels obtained at 6 and 12 months post diagnosis in the Hvidoere clinical cohort were R2 = 0.36, P < 0.0001 at 6 months and R2 = 0.37, P < 0.0001 at 12 months. CONCLUSIONS: A clinical model including age, gender, BMI, HbA1c, and insulin dose predicts stimulated C-peptide levels in children with recent-onset T1D. Estimated C-peptide is an improved surrogate to monitor residual beta-cell function outside clinical trial settings.


Assuntos
Peptídeo C/metabolismo , Diabetes Mellitus Tipo 1/diagnóstico , Diabetes Mellitus Tipo 1/tratamento farmacológico , Diabetes Mellitus Tipo 1/metabolismo , Células Secretoras de Insulina/fisiologia , Modelos Biológicos , Adolescente , Adulto , Idade de Início , Anticorpos Monoclonais Humanizados/uso terapêutico , Criança , Estudos de Coortes , Diabetes Mellitus Tipo 1/epidemiologia , Feminino , Humanos , Secreção de Insulina/fisiologia , Células Secretoras de Insulina/patologia , Masculino , Prognóstico , Indução de Remissão , Resultado do Tratamento , Adulto Jovem
16.
Bioinformatics ; 33(12): 1773-1781, 2017 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-28186228

RESUMO

MOTIVATION: Genome-wide association studies are identifying single nucleotide variants (SNVs) linked to various diseases, however the functional effect caused by these variants is often unknown. One potential functional effect, the loss or gain of protein phosphorylation sites, can be induced through variations in key amino acids that disrupt or introduce valid kinase binding patterns. Current methods for predicting the effect of SNVs on phosphorylation operate on the sequence content of reference and variant proteins. However, consideration of the amino acid sequence alone is insufficient for predicting phosphorylation change, as context factors determine kinase-substrate selection. RESULTS: We present here a method for quantifying the effect of SNVs on protein phosphorylation through an integrated system of motif analysis and context-based assessment of kinase targets. By predicting the effect that known variants across the proteome have on phosphorylation, we are able to use this background of proteome-wide variant effects to quantify the significance of novel variants for modifying phosphorylation. We validate our method on a manually curated set of phosphorylation change-causing variants from the primary literature, showing that the method predicts known examples of phosphorylation change at high levels of specificity. We apply our approach to data-sets of variants in phosphorylation site regions, showing that variants causing predicted phosphorylation loss are over-represented among disease-associated variants. AVAILABILITY AND IMPLEMENTATION: The method is freely available as a web-service at the website http://bioinf.scmb.uq.edu.au/phosphopick/snp. CONTACT: m.boden@uq.edu.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Fosforilação , Fosfotransferases/metabolismo , Polimorfismo de Nucleotídeo Único , Processamento de Proteína Pós-Traducional/genética , Software , Sequência de Aminoácidos , Humanos , Ligação Proteica
17.
Hepatology ; 66(5): 1502-1518, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28498607

RESUMO

Hepatocellular carcinomas (HCCs) exhibit a diversity of molecular phenotypes, raising major challenges in clinical management. HCCs detected by surveillance programs at an early stage are candidates for potentially curative therapies (local ablation, resection, or transplantation). In the long term, transplantation provides the lowest recurrence rates. Treatment allocation is based on tumor number, size, vascular invasion, performance status, functional liver reserve, and the prediction of early (<2 years) recurrence, which reflects the intrinsic aggressiveness of the tumor. Well-differentiated, potentially low-aggressiveness tumors form the heterogeneous molecular class of nonproliferative HCCs, characterized by an approximate 50% ß-catenin mutation rate. To define the clinical, pathological, and molecular features and the outcome of nonproliferative HCCs, we constructed a 1,133-HCC transcriptomic metadata set and validated findings in a publically available 210-HCC RNA sequencing set. We show that nonproliferative HCCs preserve the zonation program that distributes metabolic functions along the portocentral axis in normal liver. More precisely, we identified two well-differentiated, nonproliferation subclasses, namely periportal-type (wild-type ß-catenin) and perivenous-type (mutant ß-catenin), which expressed negatively correlated gene networks. The new periportal-type subclass represented 29% of all HCCs; expressed a hepatocyte nuclear factor 4A-driven gene network, which was down-regulated in mouse hepatocyte nuclear factor 4A knockout mice; were early-stage tumors by Barcelona Clinic Liver Cancer, Cancer of the Liver Italian Program, and tumor-node-metastasis staging systems; had no macrovascular invasion; and showed the lowest metastasis-specific gene expression levels and TP53 mutation rates. Also, we identified an eight-gene periportal-type HCC signature, which was independently associated with the highest 2-year recurrence-free survival by multivariate analyses in two independent cohorts of 247 and 210 patients. CONCLUSION: Well-differentiated HCCs display mutually exclusive periportal or perivenous zonation programs. Among all HCCs, periportal-type tumors have the lowest intrinsic potential for early recurrence after curative resection. (Hepatology 2017;66:1502-1518).


Assuntos
Carcinoma Hepatocelular/patologia , Neoplasias Hepáticas/patologia , Fígado/patologia , Recidiva Local de Neoplasia/patologia , beta Catenina/genética , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/mortalidade , Carcinoma Hepatocelular/cirurgia , França/epidemiologia , Fator 4 Nuclear de Hepatócito/metabolismo , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/mortalidade , Neoplasias Hepáticas/cirurgia , Mutação , Recidiva Local de Neoplasia/genética , Fenótipo , Transcriptoma
18.
PLoS Comput Biol ; 13(11): e1005752, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-29099853

RESUMO

The advent of high throughput technologies has led to a wealth of publicly available 'omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a 'molecular signature') to explain or predict biological conditions, but mainly for a single type of 'omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous 'omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple 'omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of 'omics data available from the package.


Assuntos
Biologia Computacional/métodos , Genômica , Metabolômica , Software , Interpretação Estatística de Dados , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Biologia de Sistemas
19.
J Chem Ecol ; 44(3): 215-234, 2018 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-29479643

RESUMO

Chemical ecology has strong links with metabolomics, the large-scale study of all metabolites detectable in a biological sample. Consequently, chemical ecologists are often challenged by the statistical analyses of such large datasets. This holds especially true when the purpose is to integrate multiple datasets to obtain a holistic view and a better understanding of a biological system under study. The present article provides a comprehensive resource to analyze such complex datasets using multivariate methods. It starts from the necessary pre-treatment of data including data transformations and distance calculations, to the application of both gold standard and novel multivariate methods for the integration of different omics data. We illustrate the process of analysis along with detailed results interpretations for six issues representative of the different types of biological questions encountered by chemical ecologists. We provide the necessary knowledge and tools with reproducible R codes and chemical-ecological datasets to practice and teach multivariate methods.


Assuntos
Bases de Dados de Compostos Químicos , Ecologia/métodos , Guias como Assunto , Modelos Estatísticos , Análise Multivariada
20.
BMC Bioinformatics ; 18(1): 128, 2017 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-28241739

RESUMO

BACKGROUND: Molecular signatures identified from high-throughput transcriptomic studies often have poor reliability and fail to reproduce across studies. One solution is to combine independent studies into a single integrative analysis, additionally increasing sample size. However, the different protocols and technological platforms across transcriptomic studies produce unwanted systematic variation that strongly confounds the integrative analysis results. When studies aim to discriminate an outcome of interest, the common approach is a sequential two-step procedure; unwanted systematic variation removal techniques are applied prior to classification methods. RESULTS: To limit the risk of overfitting and over-optimistic results of a two-step procedure, we developed a novel multivariate integration method, MINT, that simultaneously accounts for unwanted systematic variation and identifies predictive gene signatures with greater reproducibility and accuracy. In two biological examples on the classification of three human cell types and four subtypes of breast cancer, we combined high-dimensional microarray and RNA-seq data sets and MINT identified highly reproducible and relevant gene signatures predictive of a given phenotype. MINT led to superior classification and prediction accuracy compared to the existing sequential two-step procedures. CONCLUSIONS: MINT is a powerful approach and the first of its kind to solve the integrative classification framework in a single step by combining multiple independent studies. MINT is computationally fast as part of the mixOmics R CRAN package, available at http://www.mixOmics.org/mixMINT/ and http://cran.r-project.org/web/packages/mixOmics/ .


Assuntos
Análise Multivariada , Perfilação da Expressão Gênica , Humanos , Reprodutibilidade dos Testes , Tamanho da Amostra
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA