Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
2.
Am J Hum Genet ; 108(5): 786-798, 2021 05 06.
Artigo em Inglês | MEDLINE | ID: mdl-33811805

RESUMO

Non-additive genetic variance for complex traits is traditionally estimated from data on relatives. It is notoriously difficult to estimate without bias in non-laboratory species, including humans, because of possible confounding with environmental covariance among relatives. In principle, non-additive variance attributable to common DNA variants can be estimated from a random sample of unrelated individuals with genome-wide SNP data. Here, we jointly estimate the proportion of variance explained by additive (hSNP2), dominance (δSNP2) and additive-by-additive (ηSNP2) genetic variance in a single analysis model. We first show by simulations that our model leads to unbiased estimates and provide a new theory to predict standard errors estimated using either least-squares or maximum likelihood. We then apply the model to 70 complex traits using 254,679 unrelated individuals from the UK Biobank and 1.1 M genotyped and imputed SNPs. We found strong evidence for additive variance (average across traits h¯SNP2=0.208). In contrast, the average estimate of δ¯SNP2 across traits was 0.001, implying negligible dominance variance at causal variants tagged by common SNPs. The average epistatic variance η¯SNP2 across the traits was 0.055, not significantly different from zero because of the large sampling variance. Our results provide new evidence that genetic variance for complex traits is predominantly additive and that sample sizes of many millions of unrelated individuals are needed to estimate epistatic variance with sufficient precision.


Assuntos
Conjuntos de Dados como Assunto , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Bancos de Espécimes Biológicos , Epistasia Genética , Feminino , Genótipo , Humanos , Masculino , Modelos Genéticos , Fenótipo , Reprodutibilidade dos Testes , Reino Unido
3.
Bioinformatics ; 35(17): 3055-3062, 2019 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-30657866

RESUMO

MOTIVATION: In the continuously expanding omics era, novel computational and statistical strategies are needed for data integration and identification of biomarkers and molecular signatures. We present Data Integration Analysis for Biomarker discovery using Latent cOmponents (DIABLO), a multi-omics integrative method that seeks for common information across different data types through the selection of a subset of molecular features, while discriminating between multiple phenotypic groups. RESULTS: Using simulations and benchmark multi-omics studies, we show that DIABLO identifies features with superior biological relevance compared with existing unsupervised integrative methods, while achieving predictive performance comparable to state-of-the-art supervised approaches. DIABLO is versatile, allowing for modular-based analyses and cross-over study designs. In two case studies, DIABLO identified both known and novel multi-omics biomarkers consisting of mRNAs, miRNAs, CpGs, proteins and metabolites. AVAILABILITY AND IMPLEMENTATION: DIABLO is implemented in the mixOmics R Bioconductor package with functions for parameters' choice and visualization to assist in the interpretation of the integrative analyses, along with tutorials on http://mixomics.org and in our Bioconductor vignette. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Biomarcadores , Estudos Cross-Over , Genômica , MicroRNAs
4.
PLoS Comput Biol ; 13(11): e1005752, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-29099853

RESUMO

The advent of high throughput technologies has led to a wealth of publicly available 'omics data coming from different sources, such as transcriptomics, proteomics, metabolomics. Combining such large-scale biological data sets can lead to the discovery of important biological insights, provided that relevant information can be extracted in a holistic manner. Current statistical approaches have been focusing on identifying small subsets of molecules (a 'molecular signature') to explain or predict biological conditions, but mainly for a single type of 'omics. In addition, commonly used methods are univariate and consider each biological feature independently. We introduce mixOmics, an R package dedicated to the multivariate analysis of biological data sets with a specific focus on data exploration, dimension reduction and visualisation. By adopting a systems biology approach, the toolkit provides a wide range of methods that statistically integrate several data sets at once to probe relationships between heterogeneous 'omics data sets. Our recent methods extend Projection to Latent Structure (PLS) models for discriminant analysis, for data integration across multiple 'omics data or across independent studies, and for the identification of molecular signatures. We illustrate our latest mixOmics integrative frameworks for the multivariate analyses of 'omics data available from the package.


Assuntos
Biologia Computacional/métodos , Genômica , Metabolômica , Software , Interpretação Estatística de Dados , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Biologia de Sistemas
5.
Hepatology ; 66(5): 1502-1518, 2017 11.
Artigo em Inglês | MEDLINE | ID: mdl-28498607

RESUMO

Hepatocellular carcinomas (HCCs) exhibit a diversity of molecular phenotypes, raising major challenges in clinical management. HCCs detected by surveillance programs at an early stage are candidates for potentially curative therapies (local ablation, resection, or transplantation). In the long term, transplantation provides the lowest recurrence rates. Treatment allocation is based on tumor number, size, vascular invasion, performance status, functional liver reserve, and the prediction of early (<2 years) recurrence, which reflects the intrinsic aggressiveness of the tumor. Well-differentiated, potentially low-aggressiveness tumors form the heterogeneous molecular class of nonproliferative HCCs, characterized by an approximate 50% ß-catenin mutation rate. To define the clinical, pathological, and molecular features and the outcome of nonproliferative HCCs, we constructed a 1,133-HCC transcriptomic metadata set and validated findings in a publically available 210-HCC RNA sequencing set. We show that nonproliferative HCCs preserve the zonation program that distributes metabolic functions along the portocentral axis in normal liver. More precisely, we identified two well-differentiated, nonproliferation subclasses, namely periportal-type (wild-type ß-catenin) and perivenous-type (mutant ß-catenin), which expressed negatively correlated gene networks. The new periportal-type subclass represented 29% of all HCCs; expressed a hepatocyte nuclear factor 4A-driven gene network, which was down-regulated in mouse hepatocyte nuclear factor 4A knockout mice; were early-stage tumors by Barcelona Clinic Liver Cancer, Cancer of the Liver Italian Program, and tumor-node-metastasis staging systems; had no macrovascular invasion; and showed the lowest metastasis-specific gene expression levels and TP53 mutation rates. Also, we identified an eight-gene periportal-type HCC signature, which was independently associated with the highest 2-year recurrence-free survival by multivariate analyses in two independent cohorts of 247 and 210 patients. CONCLUSION: Well-differentiated HCCs display mutually exclusive periportal or perivenous zonation programs. Among all HCCs, periportal-type tumors have the lowest intrinsic potential for early recurrence after curative resection. (Hepatology 2017;66:1502-1518).


Assuntos
Carcinoma Hepatocelular/patologia , Neoplasias Hepáticas/patologia , Fígado/patologia , Recidiva Local de Neoplasia/patologia , beta Catenina/genética , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/mortalidade , Carcinoma Hepatocelular/cirurgia , França/epidemiologia , Fator 4 Nuclear de Hepatócito/metabolismo , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/mortalidade , Neoplasias Hepáticas/cirurgia , Mutação , Recidiva Local de Neoplasia/genética , Fenótipo , Transcriptoma
6.
BMC Bioinformatics ; 18(1): 128, 2017 Feb 27.
Artigo em Inglês | MEDLINE | ID: mdl-28241739

RESUMO

BACKGROUND: Molecular signatures identified from high-throughput transcriptomic studies often have poor reliability and fail to reproduce across studies. One solution is to combine independent studies into a single integrative analysis, additionally increasing sample size. However, the different protocols and technological platforms across transcriptomic studies produce unwanted systematic variation that strongly confounds the integrative analysis results. When studies aim to discriminate an outcome of interest, the common approach is a sequential two-step procedure; unwanted systematic variation removal techniques are applied prior to classification methods. RESULTS: To limit the risk of overfitting and over-optimistic results of a two-step procedure, we developed a novel multivariate integration method, MINT, that simultaneously accounts for unwanted systematic variation and identifies predictive gene signatures with greater reproducibility and accuracy. In two biological examples on the classification of three human cell types and four subtypes of breast cancer, we combined high-dimensional microarray and RNA-seq data sets and MINT identified highly reproducible and relevant gene signatures predictive of a given phenotype. MINT led to superior classification and prediction accuracy compared to the existing sequential two-step procedures. CONCLUSIONS: MINT is a powerful approach and the first of its kind to solve the integrative classification framework in a single step by combining multiple independent studies. MINT is computationally fast as part of the mixOmics R CRAN package, available at http://www.mixOmics.org/mixMINT/ and http://cran.r-project.org/web/packages/mixOmics/ .


Assuntos
Análise Multivariada , Perfilação da Expressão Gênica , Humanos , Reprodutibilidade dos Testes , Tamanho da Amostra
7.
Sci Rep ; 6: 38522, 2016 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-27994231

RESUMO

Effective disease surveillance is critical to the functioning of health systems. Traditional approaches are, however, limited in their ability to deliver timely information. Internet-based surveillance systems are a promising approach that may circumvent many of the limitations of traditional health surveillance systems and provide more intelligence on cases of infection, including cases from those that do not use the healthcare system. Infectious disease surveillance systems built on Internet search metrics have been shown to produce accurate estimates of disease weeks before traditional systems and are an economically attractive approach to surveillance; they are, however, also prone to error under certain circumstances. This study sought to explore previously unmodeled diseases by investigating the link between Google Trends search metrics and Australian weekly notification data. We propose using four alternative disease modelling strategies based on linear models that studied the length of the training period used for model construction, determined the most appropriate lag for search metrics, used wavelet transformation for denoising data and enabled the identification of key search queries for each disease. Out of the twenty-four diseases assessed with Australian data, our nowcasting results highlighted promise for two diseases of international concern, Ross River virus and pneumococcal disease.


Assuntos
Doenças Transmissíveis/patologia , Internet , Modelos Biológicos , Vigilância da População , Austrália , Humanos , Modelos Lineares , Estatísticas não Paramétricas
8.
Sci Rep ; 6: 34000, 2016 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-27666291

RESUMO

DNA repair genes and pathways that are transcriptionally dysregulated in cancer provide the first line of evidence for the altered DNA repair status in tumours, and hence have been explored intensively as a source for biomarker discovery. The molecular mechanisms underlying DNA repair dysregulation, however, have not been systematically investigated in any cancer type. In this study, we performed a statistical analysis to dissect the roles of DNA copy number alteration (CNA), DNA methylation (DM) at gene promoter regions and the expression changes of transcription factors (TFs) in the differential expression of individual DNA repair genes in normal versus tumour breast samples. These gene-level results were summarised at pathway level to assess whether different DNA repair pathways are affected in distinct manners. Our results suggest that CNA and expression changes of TFs are major causes of DNA repair dysregulation in breast cancer, and that a subset of the identified TFs may exert global impacts on the dysregulation of multiple repair pathways. Our work hence provides novel insights into DNA repair dysregulation in breast cancer. These insights improve our understanding of the molecular basis of the DNA repair biomarkers identified thus far, and have potential to inform future biomarker discovery.

9.
PeerJ ; 4: e1845, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27042394

RESUMO

Mesenchymal stromal cells (MSC) are widely used for the study of mesenchymal tissue repair, and increasingly adopted for cell therapy, despite the lack of consensus on the identity of these cells. In part this is due to the lack of specificity of MSC markers. Distinguishing MSC from other stromal cells such as fibroblasts is particularly difficult using standard analysis of surface proteins, and there is an urgent need for improved classification approaches. Transcriptome profiling is commonly used to describe and compare different cell types; however, efforts to identify specific markers of rare cellular subsets may be confounded by the small sample sizes of most studies. Consequently, it is difficult to derive reproducible, and therefore useful markers. We addressed the question of MSC classification with a large integrative analysis of many public MSC datasets. We derived a sparse classifier (The Rohart MSC test) that accurately distinguished MSC from non-MSC samples with >97% accuracy on an internal training set of 635 samples from 41 studies derived on 10 different microarray platforms. The classifier was validated on an external test set of 1,291 samples from 65 studies derived on 15 different platforms, with >95% accuracy. The genes that contribute to the MSC classifier formed a protein-interaction network that included known MSC markers. Further evidence of the relevance of this new MSC panel came from the high number of Mendelian disorders associated with mutations in more than 65% of the network. These result in mesenchymal defects, particularly impacting on skeletal growth and function. The Rohart MSC test is a simple in silico test that accurately discriminates MSC from fibroblasts, other adult stem/progenitor cell types or differentiated stromal cells. It has been implemented in the www.stemformatics.org resource, to assist researchers wishing to benchmark their own MSC datasets or data from the public domain. The code is available from the CRAN repository and all data used to generate the MSC test is available to download via the Gene Expression Omnibus or the Stemformatics resource.

10.
BMC Genomics ; 16: 1055, 2015 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-26651482

RESUMO

BACKGROUND: Among transcriptomic studies, those comparing species or populations can increase our understanding of the impact of the evolutionary forces on the differentiation of populations. A particular situation is the one of short evolution time with breeds of a domesticated species that underwent strong selective pressures. In this study, the gene expression diversity across five pig breeds has been explored in muscle. Samples came from: 24 Duroc, 33 Landrace, 41 Large White dam line, 10 Large White sire line and 39 Piétrain. From these animals, 147 muscle samples obtained at slaughter were analyzed using the porcine Agilent 44 K v1 microarray. RESULTS: A total of 12,358 genes were identified as expressed in muscle after normalization and 1,703 genes were declared differential for at least one breed (FDR < 0.001). The functional analysis highlighted that gene expression diversity is mainly linked to cellular signaling pathways such as the PI3K (phosphoinositide 3-kinase) pathway. The PI3K pathway is known to be involved in the control of development of the skeletal muscle mass by affecting extracellular matrix - receptor interactions, regulation of actin cytoskeleton pathways and some metabolic functions. This study also highlighted 228 spots (171 unique genes) that differentiate the breeds from each other. A common subgroup of 15 genes selected by three statistical methods was able to differentiate Duroc, Large White and Piétrain breeds. CONCLUSIONS: This study on transcriptomic differentiation across Western pig breeds highlighted a global picture: mainly signaling pathways were affected. This result is consistent with the selection objective of increasing muscle mass. These transcriptional changes may indicate selection pressure or simply breed differences which may be driven by human selection. Further work aiming at comparing genetic and transcriptomic diversities would further increase our understanding of the consequences of human impact on livestock species.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Transdução de Sinais , Sus scrofa/genética , Animais , Cruzamento , Feminino , Perfilação da Expressão Gênica/veterinária , Regulação da Expressão Gênica , Masculino , Músculo Esquelético/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos/veterinária , Sus scrofa/classificação , Sus scrofa/metabolismo , Suínos
11.
Genomics ; 103(4): 239-51, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24667244

RESUMO

Gene expression databases contain invaluable information about a range of cell states, but the question "Where is my gene of interest expressed?" remains one of the most difficult to systematically assess when relevant data is derived on different platforms. Barriers to integrating this data include disparities in data formats and scale, a lack of common identifiers, and the disproportionate contribution of a platform to the 'batch effect'. There are few purpose-built cross-platform normalization strategies, and most of these fit data to an idealized data structure, which in turn may compromise gene expression comparisons between different platforms. YuGene addresses this gap by providing a simple transform that assigns a modified cumulative proportion value to each measurement, without losing essential underlying information on data distributions or experimental correlates. The Yugene transform is applied to individual samples and is suitable to apply to data with different distributions. Yugene is robust to combining datasets of different sizes, does not require global renormalization as new data is added, and does not require a common identifier. YuGene was benchmarked against commonly used normalization approaches, performing favorably in comparison to quantile (RMA), Z-score or rank methods. Implementation in the www.stemformatics.org resource provides users with expression queries across stem cell related datasets. Probe performance statistics including poorly performing (never expressed) probes, and examples of probes/genes expressed in a sample-restricted manner are provided. The YuGene software is implemented as an R package available from CRAN.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Software , Biologia Computacional/métodos , Humanos , Internet , Análise de Sequência com Séries de Oligonucleotídeos , Células-Tronco
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...