Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Hum Mol Genet ; 33(1): 38-47, 2023 Dec 12.
Artigo em Inglês | MEDLINE | ID: mdl-37740403

RESUMO

Breast cancer (BC) risk is suspected to be linked to thyroid disorders, however observational studies exploring the association between BC and thyroid disorders gave conflicting results. We proposed an alternative approach by investigating the shared genetic risk factors between BC and several thyroid traits. We report a positive genetic correlation between BC and thyroxine (FT4) levels (corr = 0.13, p-value = 2.0 × 10-4) and a negative genetic correlation between BC and thyroid-stimulating hormone (TSH) levels (corr = -0.09, p-value = 0.03). These associations are more striking when restricting the analysis to estrogen receptor-positive BC. Moreover, the polygenic risk scores (PRS) for FT4 and hyperthyroidism are positively associated to BC risk (OR = 1.07, 95%CI: 1.00-1.13, p-value = 2.8 × 10-2 and OR = 1.04, 95%CI: 1.00-1.08, p-value = 3.8 × 10-2, respectively), while the PRS for TSH is inversely associated to BC risk (OR = 0.93, 95%CI: 0.89-0.97, p-value = 2.0 × 10-3). Using the PLACO method, we detected 49 loci associated to both BC and thyroid traits (p-value < 5 × 10-8), in the vicinity of 130 genes. An additional colocalization and gene-set enrichment analyses showed a convincing causal role for a known pleiotropic locus at 2q35 and revealed an additional one at 8q22.1 associated to both BC and thyroid cancer. We also found two new pleiotropic loci at 14q32.33 and 17q21.31 that were associated to both TSH levels and BC risk. Enrichment analyses and evidence of regulatory signals also highlighted brain tissues and immune system as candidates for obtaining associations between BC and TSH levels. Overall, our study sheds light on the complex interplay between BC and thyroid traits and provides evidence of shared genetic risk between those conditions.


Assuntos
Neoplasias da Mama , Glândula Tireoide , Humanos , Feminino , Neoplasias da Mama/genética , Tireotropina/genética , Tiroxina/genética , Fatores de Risco , Estratificação de Risco Genético
2.
Nat Methods ; 18(11): 1304-1316, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34725484

RESUMO

Glycoproteomics is a powerful yet analytically challenging research tool. Software packages aiding the interpretation of complex glycopeptide tandem mass spectra have appeared, but their relative performance remains untested. Conducted through the HUPO Human Glycoproteomics Initiative, this community study, comprising both developers and users of glycoproteomics software, evaluates solutions for system-wide glycopeptide analysis. The same mass spectrometrybased glycoproteomics datasets from human serum were shared with participants and the relative team performance for N- and O-glycopeptide data analysis was comprehensively established by orthogonal performance tests. Although the results were variable, several high-performance glycoproteomics informatics strategies were identified. Deep analysis of the data revealed key performance-associated search parameters and led to recommendations for improved 'high-coverage' and 'high-accuracy' glycoproteomics search solutions. This study concludes that diverse software packages for comprehensive glycopeptide data analysis exist, points to several high-performance search strategies and specifies key variables that will guide future software developments and assist informatics decision-making in glycoproteomics.


Assuntos
Glicopeptídeos/sangue , Glicoproteínas/sangue , Informática/métodos , Proteoma/análise , Proteômica/métodos , Pesquisadores/estatística & dados numéricos , Software , Glicosilação , Humanos , Proteoma/metabolismo , Espectrometria de Massas em Tandem
3.
BMC Med Res Methodol ; 22(1): 9, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34996381

RESUMO

BACKGROUND: Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often integrating additional information such as gene pathway knowledge can improve statistical efficiency and biological interpretation. In this article, we propose statistical methods which incorporate both gene pathway and pleiotropy knowledge to increase statistical power and identify important risk variants affecting multiple traits. METHODS: We propose novel feature selection methods for the group variable selection in multi-task regression problem. We develop penalised likelihood methods exploiting different penalties to induce structured sparsity at a gene (or pathway) and SNP level across all studies. We implement an alternating direction method of multipliers (ADMM) algorithm for our penalised regression methods. The performance of our approaches are compared to a subset based meta analysis approach on simulated data sets. A bootstrap sampling strategy is provided to explore the stability of the penalised methods. RESULTS: Our methods are applied to identify potential pleiotropy in an application considering the joint analysis of thyroid and breast cancers. The methods were able to detect eleven potential pleiotropic SNPs and six pathways. A simulation study found that our method was able to detect more true signals than a popular competing method while retaining a similar false discovery rate. CONCLUSION: We developed feature selection methods for jointly analysing multiple logistic regression tasks where prior grouping knowledge is available. Our method performed well on both simulation studies and when applied to a real data analysis of multiple cancers.


Assuntos
Estudo de Associação Genômica Ampla , Genômica , Algoritmos , Genômica/métodos , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
4.
BMC Bioinformatics ; 22(1): 86, 2021 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-33627076

RESUMO

BACKGROUND: The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. RESULTS: Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. CONCLUSION: The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Análise dos Mínimos Quadrados , Fenótipo
5.
Stat Med ; 40(6): 1498-1518, 2021 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-33368447

RESUMO

An increasing number of genome-wide association studies (GWAS) summary statistics is made available to the scientific community. Exploiting these results from multiple phenotypes would permit identification of novel pleiotropic associations. In addition, incorporating prior biological information in GWAS such as group structure information (gene or pathway) has shown some success in classical GWAS approaches. However, this has not been widely explored in the context of pleiotropy. We propose a Bayesian meta-analysis approach (termed GCPBayes) that uses summary-level GWAS data across multiple phenotypes to detect pleiotropy at both group-level (gene or pathway) and within group (eg, at the SNP level). We consider both continuous and Dirac spike and slab priors for group selection. We also use a Bayesian sparse group selection approach with hierarchical spike and slab priors that enables us to select important variables both at the group level and within group. GCPBayes uses a Bayesian statistical framework based on Markov chain Monte Carlo (MCMC) Gibbs sampling. It can be applied to multiple types of phenotypes for studies with overlapping or nonoverlapping subjects, and takes into account heterogeneity in the effect size and allows for the opposite direction of the genetic effects across traits. Simulations show that the proposed methods outperform benchmark approaches such as ASSET and CPBayes in the ability to retrieve pleiotropic associations at both SNP and gene-levels. To illustrate the GCPBayes method, we investigate the shared genetic effects between thyroid cancer and breast cancer in candidate pathways.


Assuntos
Estudo de Associação Genômica Ampla , Neoplasias , Teorema de Bayes , Genômica , Estrutura de Grupo , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único
6.
Crit Care ; 25(1): 199, 2021 06 09.
Artigo em Inglês | MEDLINE | ID: mdl-34108029

RESUMO

BACKGROUND: Heterogeneous respiratory system static compliance (CRS) values and levels of hypoxemia in patients with novel coronavirus disease (COVID-19) requiring mechanical ventilation have been reported in previous small-case series or studies conducted at a national level. METHODS: We designed a retrospective observational cohort study with rapid data gathering from the international COVID-19 Critical Care Consortium study to comprehensively describe CRS-calculated as: tidal volume/[airway plateau pressure-positive end-expiratory pressure (PEEP)]-and its association with ventilatory management and outcomes of COVID-19 patients on mechanical ventilation (MV), admitted to intensive care units (ICU) worldwide. RESULTS: We studied 745 patients from 22 countries, who required admission to the ICU and MV from January 14 to December 31, 2020, and presented at least one value of CRS within the first seven days of MV. Median (IQR) age was 62 (52-71), patients were predominantly males (68%) and from Europe/North and South America (88%). CRS, within 48 h from endotracheal intubation, was available in 649 patients and was neither associated with the duration from onset of symptoms to commencement of MV (p = 0.417) nor with PaO2/FiO2 (p = 0.100). Females presented lower CRS than males (95% CI of CRS difference between females-males: - 11.8 to - 7.4 mL/cmH2O p < 0.001), and although females presented higher body mass index (BMI), association of BMI with CRS was marginal (p = 0.139). Ventilatory management varied across CRS range, resulting in a significant association between CRS and driving pressure (estimated decrease - 0.31 cmH2O/L per mL/cmH20 of CRS, 95% CI - 0.48 to - 0.14, p < 0.001). Overall, 28-day ICU mortality, accounting for the competing risk of being discharged within the period, was 35.6% (SE 1.7). Cox proportional hazard analysis demonstrated that CRS (+ 10 mL/cm H2O) was only associated with being discharge from the ICU within 28 days (HR 1.14, 95% CI 1.02-1.28, p = 0.018). CONCLUSIONS: This multicentre report provides a comprehensive account of CRS in COVID-19 patients on MV. CRS measured within 48 h from commencement of MV has marginal predictive value for 28-day mortality, but was associated with being discharged from ICU within the same period. Trial documentation: Available at https://www.covid-critical.com/study . TRIAL REGISTRATION: ACTRN12620000421932.


Assuntos
COVID-19/complicações , COVID-19/terapia , Complacência Pulmonar/fisiologia , Respiração Artificial/métodos , Síndrome do Desconforto Respiratório/etiologia , Síndrome do Desconforto Respiratório/terapia , Adulto , Estudos de Coortes , Cuidados Críticos/métodos , Europa (Continente) , Feminino , Humanos , Unidades de Terapia Intensiva , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Índice de Gravidade de Doença
8.
Glob Chang Biol ; 26(5): 2785-2797, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32115808

RESUMO

Anticipating future changes of an ecosystem's dynamics requires knowledge of how its key communities respond to current environmental regimes. The Great Barrier Reef (GBR) is under threat, with rapid changes of its reef-building hard coral (HC) community structure already evident across broad spatial scales. While several underlying relationships between HC and multiple disturbances have been documented, responses of other benthic communities to disturbances are not well understood. Here we used statistical modelling to explore the effects of broad-scale climate-related disturbances on benthic communities to predict their structure under scenarios of increasing disturbance frequency. We parameterized a multivariate model using the composition of benthic communities estimated by 145,000 observations from the northern GBR between 2012 and 2017. During this time, surveyed reefs were variously impacted by two tropical cyclones and two heat stress events that resulted in extensive HC mortality. This unprecedented sequence of disturbances was used to estimate the effects of discrete versus interacting disturbances on the compositional structure of HC, soft corals (SC) and algae. Discrete disturbances increased the prevalence of algae relative to HC while the interaction between cyclones and heat stress was the main driver of the increase in SC relative to algae and HC. Predictions from disturbance scenarios included relative increases in algae versus SC that varied by the frequency and types of disturbance interactions. However, high uncertainty of compositional changes in the presence of several disturbances shows that responses of algae and SC to the decline in HC needs further research. Better understanding of the effects of multiple disturbances on benthic communities as a whole is essential for predicting the future status of coral reefs and managing them in the light of new environmental regimes. The approach we develop here opens new opportunities for reaching this goal.


Assuntos
Antozoários , Tempestades Ciclônicas , Animais , Recifes de Corais , Ecossistema
9.
Stat Med ; 39(28): 4201-4217, 2020 12 10.
Artigo em Inglês | MEDLINE | ID: mdl-32844489

RESUMO

Identification of biomarkers is an emerging area in oncology. In this article, we develop an efficient statistical procedure for the classification of protein markers according to their effect on cancer progression. A high-dimensional time-course dataset of protein markers for 80 patients motivates us for developing the model. The threshold value is formulated as a level of a marker having maximum impact on cancer progression. The classification algorithm technique for high-dimensional time-course data is developed and the algorithm is validated by comparing random components using both proportional hazard and accelerated failure time frailty models. The study elucidates the application of two separate joint modeling techniques using auto regressive-type model and mixed effect model for time-course data and proportional hazard model for survival data with proper utilization of Bayesian methodology. Also, a prognostic score is developed on the basis of few selected genes with application on patients. This study facilitates to identify relevant biomarkers from a set of markers.


Assuntos
Algoritmos , Oncologia , Teorema de Bayes , Biomarcadores , Humanos , Modelos de Riscos Proporcionais
10.
Environ Sci Technol ; 54(21): 13719-13730, 2020 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-32856893

RESUMO

Anomaly detection (AD) in high-volume environmental data requires one to tackle a series of challenges associated with the typical low frequency of anomalous events, the broad-range of possible anomaly types, and local nonstationary environmental conditions, suggesting the need for flexible statistical methods that are able to cope with unbalanced high-volume data problems. Here, we aimed to detect anomalies caused by technical errors in water-quality (turbidity and conductivity) data collected by automated in situ sensors deployed in contrasting riverine and estuarine environments. We first applied a range of artificial neural networks that differed in both learning method and hyperparameter values, then calibrated models using a Bayesian multiobjective optimization procedure, and selected and evaluated the "best" model for each water-quality variable, environment, and anomaly type. We found that semi-supervised classification was better able to detect sudden spikes, sudden shifts, and small sudden spikes, whereas supervised classification had higher accuracy for predicting long-term anomalies associated with drifts and periods of otherwise unexplained high variability.


Assuntos
Redes Neurais de Computação , Água , Teorema de Bayes , Qualidade da Água
11.
BMC Med Res Methodol ; 19(1): 79, 2019 04 16.
Artigo em Inglês | MEDLINE | ID: mdl-30991962

RESUMO

BACKGROUND: In medical research, explanatory continuous variables are frequently transformed or converted into categorical variables. If the coding is unknown, many tests can be used to identify the "optimal" transformation. This common process, involving the problems of multiple testing, requires a correction of the significance level. Liquet and Commenges proposed an asymptotic correction of significance level in the context of generalized linear models (GLM) (Liquet and Commenges, Stat Probab Lett 71:33-38, 2005). This procedure has been developed for dichotomous and Box-Cox transformations. Furthermore, Liquet and Riou suggested the use of resampling methods to estimate the significance level for transformations into categorical variables with more than two levels (Liquet and Riou, BMC Med Res Methodol 13:75, 2013). RESULTS: CPMCGLM provides to users both methods of p-value adjustment. Futhermore, they are available for a large set of transformations. This paper aims to provide insight the user an overview of the methodological context, and explain in detail the use of the CPMCGLM R package through its application to a real epidemiological dataset. CONCLUSION: We present here the CPMCGLMR package providing efficient methods for the correction of type-I error rate in the context of generalized linear models. This is the first and the only available package in R providing such methods applied to this context. This package is designed to help researchers, who work principally in the field of biostatistics and epidemiology, to analyze their data in the context of optimal cutoff point determination.


Assuntos
Algoritmos , Biometria/métodos , Biologia Computacional/métodos , Modelos Lineares , HDL-Colesterol/sangue , Demência/sangue , Feminino , Humanos , Masculino , Reprodutibilidade dos Testes
12.
Int J Cancer ; 143(6): 1335-1347, 2018 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-29667176

RESUMO

Recent prospective studies have shown that dysregulation of the immune system may precede the development of B-cell lymphomas (BCL) in immunocompetent individuals. However, to date, the studies were restricted to a few immune markers, which were considered separately. Using a nested case-control study within two European prospective cohorts, we measured plasma levels of 28 immune markers in samples collected a median of 6 years before diagnosis (range 2.01-15.97) in 268 incident cases of BCL (including multiple myeloma [MM]) and matched controls. Linear mixed models and partial least square analyses were used to analyze the association between levels of immune marker and the incidence of BCL and its main histological subtypes and to investigate potential biomarkers predictive of the time to diagnosis. Linear mixed model analyses identified associations linking lower levels of fibroblast growth factor-2 (FGF-2 p = 7.2 × 10-4 ) and transforming growth factor alpha (TGF-α, p = 6.5 × 10-5 ) and BCL incidence. Analyses stratified by histological subtypes identified inverse associations for MM subtype including FGF-2 (p = 7.8 × 10-7 ), TGF-α (p = 4.08 × 10-5 ), fractalkine (p = 1.12 × 10-3 ), monocyte chemotactic protein-3 (p = 1.36 × 10-4 ), macrophage inflammatory protein 1-alpha (p = 4.6 × 10-4 ) and vascular endothelial growth factor (p = 4.23 × 10-5 ). Our results also provided marginal support for already reported associations between chemokines and diffuse large BCL (DLBCL) and cytokines and chronic lymphocytic leukemia (CLL). Case-only analyses showed that Granulocyte-macrophage colony stimulating factor levels were consistently higher closer to diagnosis, which provides further evidence of its role in tumor progression. In conclusion, our study suggests a role of growth-factors in the incidence of MM and of chemokine and cytokine regulation in DLBCL and CLL.


Assuntos
Biomarcadores/sangue , Linfoma Difuso de Grandes Células B/sangue , Mieloma Múltiplo/sangue , Adulto , Idoso , Estudos de Casos e Controles , Quimiocina CCL7/sangue , Quimiocina CX3CL1/sangue , Europa (Continente) , Feminino , Fator 2 de Crescimento de Fibroblastos/sangue , Seguimentos , Humanos , Incidência , Linfoma Difuso de Grandes Células B/diagnóstico , Linfoma Difuso de Grandes Células B/epidemiologia , Linfoma Difuso de Grandes Células B/imunologia , Masculino , Pessoa de Meia-Idade , Mieloma Múltiplo/diagnóstico , Mieloma Múltiplo/epidemiologia , Mieloma Múltiplo/imunologia , Análise Multivariada , Prognóstico , Estudos Prospectivos , Fator de Crescimento Transformador alfa/sangue , Fator A de Crescimento do Endotélio Vascular/sangue
13.
Stat Med ; 37(23): 3338-3356, 2018 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-29888397

RESUMO

Integrative analysis of high dimensional omics datasets has been studied by many authors in recent years. By incorporating prior known relationships among the variables, these analyses have been successful in elucidating the relationships between different sets of omics data. In this article, our goal is to identify important relationships between genomic expression and cytokine data from a human immunodeficiency virus vaccine trial. We proposed a flexible partial least squares technique, which incorporates group and subgroup structure in the modelling process. Our new method accounts for both grouping of genetic markers (eg, gene sets) and temporal effects. The method generalises existing sparse modelling techniques in the partial least squares methodology and establishes theoretical connections to variable selection methods for supervised and unsupervised problems. Simulation studies are performed to investigate the performance of our methods over alternative sparse approaches. Our R package sgspls is available at https://github.com/matt-sutton/sgspls.


Assuntos
Análise dos Mínimos Quadrados , Modelos Estatísticos , Vacinas contra a AIDS/uso terapêutico , Algoritmos , Bioestatística , Ensaios Clínicos como Assunto/estatística & dados numéricos , Simulação por Computador , Genômica/métodos , Humanos , Funções Verossimilhança , Análise Multivariada , Análise de Regressão
14.
Bioinformatics ; 32(1): 35-42, 2016 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-26358727

RESUMO

MOTIVATION: The association between two blocks of 'omics' data brings challenging issues in computational biology due to their size and complexity. Here, we focus on a class of multivariate statistical methods called partial least square (PLS). Sparse version of PLS (sPLS) operates integration of two datasets while simultaneously selecting the contributing variables. However, these methods do not take into account the important structural or group effects due to the relationship between markers among biological pathways. Hence, considering the predefined groups of markers (e.g. genesets), this could improve the relevance and the efficacy of the PLS approach. RESULTS: We propose two PLS extensions called group PLS (gPLS) and sparse gPLS (sgPLS). Our algorithm enables to study the relationship between two different types of omics data (e.g. SNP and gene expression) or between an omics dataset and multivariate phenotypes (e.g. cytokine secretion). We demonstrate the good performance of gPLS and sgPLS compared with the sPLS in the context of grouped data. Then, these methods are compared through an HIV therapeutic vaccine trial. Our approaches provide parsimonious models to reveal the relationship between gene abundance and the immunological response to the vaccine. AVAILABILITY AND IMPLEMENTATION: The approach is implemented in a comprehensive R package called sgPLS available on the CRAN. CONTACT: b.liquet@uq.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Genômica/métodos , Vacinas contra a AIDS/imunologia , Simulação por Computador , Humanos , Análise dos Mínimos Quadrados , Tamanho da Amostra
15.
Stat Med ; 35(16): 2687-714, 2016 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-26914402

RESUMO

Multiple endpoints are increasingly used in clinical trials. The significance of some of these clinical trials is established if at least r null hypotheses are rejected among m that are simultaneously tested. The usual approach in multiple hypothesis testing is to control the family-wise error rate, which is defined as the probability that at least one type-I error is made. More recently, the q-generalized family-wise error rate has been introduced to control the probability of making at least q false rejections. For procedures controlling this global type-I error rate, we define a type-II r-generalized family-wise error rate, which is directly related to the r-power defined as the probability of rejecting at least r false null hypotheses. We obtain very general power formulas that can be used to compute the sample size for single-step and step-wise procedures. These are implemented in our R package rPowerSampleSize available on the CRAN, making them directly available to end users. Complexities of the formulas are presented to gain insight into computation time issues. Comparison with Monte Carlo strategy is also presented. We compute sample sizes for two clinical trials involving multiple endpoints: one designed to investigate the effectiveness of a drug against acute heart failure and the other for the immunogenicity of a vaccine strategy against pneumococcus. Copyright © 2016 John Wiley & Sons, Ltd.


Assuntos
Projetos de Pesquisa , Tamanho da Amostra , Humanos , Método de Monte Carlo , Probabilidade
16.
PLoS Genet ; 9(8): e1003657, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23950726

RESUMO

Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n  =  3,175), when compared with the largest published meta-GWAS (n > 100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This provides a powerful tool for the analysis of diverse genomic features, for instance including gene expression and exome sequencing data, where complex dependencies are present in the predictor space.


Assuntos
Algoritmos , Evolução Biológica , Estudo de Associação Genômica Ampla , Locos de Características Quantitativas/genética , Teorema de Bayes , Exoma/genética , Expressão Gênica , Humanos , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
17.
J Stat Softw ; 69(2)2016 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-29568242

RESUMO

Technological advances in molecular biology over the past decade have given rise to high dimensional and complex datasets offering the possibility to investigate biological associations between a range of genomic features and complex phenotypes. The analysis of this novel type of data generated unprecedented computational challenges which ultimately led to the definition and implementation of computationally efficient statistical models that were able to scale to genome-wide data, including Bayesian variable selection approaches. While extensive methodological work has been carried out in this area, only few methods capable of handling hundreds of thousands of predictors were implemented and distributed. Among these we recently proposed GUESS, a computationally optimised algorithm making use of graphics processing unit capabilities, which can accommodate multiple outcomes. In this paper we propose R2GUESS, an R package wrapping the original C++ source code. In addition to providing a user-friendly interface of the original code automating its parametrisation, and data handling, R2GUESS also incorporates many features to explore the data, to extend statistical inferences from the native algorithm (e.g., effect size estimation, significance assessment), and to visualize outputs from the algorithm. We first detail the model and its parametrisation, and describe in details its optimised implementation. Based on two examples we finally illustrate its statistical performances and flexibility.

18.
J Biopharm Stat ; 24(2): 378-97, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24605975

RESUMO

The use of two or more primary correlated endpoints is becoming increasingly common. A mandatory approach when analyzing data from such clinical trials is to control the family-wise error rate (FWER). In this context, we provide formulas for computation of sample size and for data analysis. Two approaches are discussed: an individual method based on a union-intersection procedure and a global procedure, based on a multivariate model that can take into account adjustment variables. These methods are illustrated with simulation studies and applications. An R package known as rPowerSampleSize is also available.


Assuntos
Ensaios Clínicos como Assunto , Simulação por Computador , Determinação de Ponto Final/métodos , Ensaios Clínicos como Assunto/estatística & dados numéricos , Simulação por Computador/estatística & dados numéricos , Determinação de Ponto Final/estatística & dados numéricos , Humanos , Tamanho da Amostra
19.
BMC Med Res Methodol ; 13: 75, 2013 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-23758852

RESUMO

BACKGROUND: In statistical modeling, finding the most favorable coding for an exploratory quantitative variable involves many tests. This process involves multiple testing problems and requires the correction of the significance level. METHODS: For each coding, a test on the nullity of the coefficient associated with the new coded variable is computed. The selected coding corresponds to that associated with the largest statistical test (or equivalently the smallest pvalue). In the context of the Generalized Linear Model, Liquet and Commenges (Stat Probability Lett,71:33-38,2005) proposed an asymptotic correction of the significance level. This procedure, based on the score test, has been developed for dichotomous and Box-Cox transformations. In this paper, we suggest the use of resampling methods to estimate the significance level for categorical transformations with more than two levels and, by definition those that involve more than one parameter in the model. The categorical transformation is a more flexible way to explore the unknown shape of the effect between an explanatory and a dependent variable. RESULTS: The simulations we ran in this study showed good performances of the proposed methods. These methods were illustrated using the data from a study of the relationship between cholesterol and dementia. CONCLUSION: The algorithms were implemented using R, and the associated CPMCGLM R package is available on the CRAN.


Assuntos
Simulação por Computador , Projetos de Pesquisa Epidemiológica , Modelos Lineares , Idoso , Algoritmos , HDL-Colesterol/sangue , Interpretação Estatística de Dados , Demência/sangue , Fatores Epidemiológicos , Humanos , Análise Multivariada , Reprodutibilidade dos Testes , Fatores de Risco , Tamanho da Amostra
20.
PLoS One ; 18(6): e0287705, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37384667

RESUMO

Compositional data are a special kind of data, represented as a proportion carrying relative information. Although this type of data is widely spread, no solution exists to deal with the cases where the classes are not well balanced. After describing compositional data imbalance, this paper proposes an adaptation of the original Synthetic Minority Oversampling TEchnique (SMOTE) to deal with compositional data imbalance. The new approach, called SMOTE for Compositional Data (SMOTE-CD), generates synthetic examples by computing a linear combination of selected existing data points, using compositional data operations. The performance of the SMOTE-CD is tested with three different regressors (Gradient Boosting tree, Neural Networks, Dirichlet regressor) applied to two real datasets and to synthetic generated data, and the performance is evaluated using accuracy, cross-entropy, F1-score, R2 score and RMSE. The results show improvements across all metrics, but the impact of oversampling on performance varies depending on the model and the data. In some cases, oversampling may lead to a decrease in performance for the majority class. However, for the real data, the best performance across all models is achieved when oversampling is used. Notably, the F1-score is consistently increased with oversampling. Unlike the original technique, the performance is not improved when combining oversampling of the minority classes and undersampling of the majority class. The Python package smote-cd implements the method and is available online.


Assuntos
Aclimatação , Benchmarking , Entropia , Grupos Minoritários , Redes Neurais de Computação
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA