RESUMO
Genetic interaction is considered as one of the main heritable component of complex traits. With the emergence of genome-wide association studies (GWAS), a collection of statistical methods dedicated to the identification of interaction at the SNP level have been proposed. More recently, gene-based gene-gene interaction testing has emerged as an attractive alternative as they confer advantage in both statistical power and biological interpretation. Most of the gene-based interaction methods rely on a multidimensional modeling of the interaction, thus facing a lack of robustness against the huge space of interaction patterns. In this paper, we study a global testing approaches to address the issue of gene-based gene-gene interaction. Based on a logistic regression modeling framework, all SNP-SNP interaction tests are combined to produce a gene-level test for interaction. We propose an omnibus test that takes advantage of (1) the heterogeneity between existing global tests and (2) the complementarity between allele-based and genotype-based coding of SNPs. Through an extensive simulation study, it is demonstrated that the proposed omnibus test has the ability to detect with high power the most common interaction genetic models with one causal pair as well as more complex genetic models where more than one causal pair is involved. On the other hand, the flexibility of the proposed approach is shown to be robust and improves power compared to single global tests in replication studies. Furthermore, the application of our procedure to real datasets confirms the adaptability of our approach to replicate various gene-gene interactions.
Assuntos
Epistasia Genética , Estudo de Associação Genômica Ampla , Simulação por Computador , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Among the large of number of statistical methods that have been proposed to identify gene-gene interactions in case-control genome-wide association studies (GWAS), gene-based methods have recently grown in popularity as they confer advantage in both statistical power and biological interpretation. All of the gene-based methods jointly model the distribution of single nucleotide polymorphisms (SNPs) sets prior to the statistical test, leading to a limited power to detect sums of SNP-SNP signals. In this paper, we instead propose a gene-based method that first performs SNP-SNP interaction tests before aggregating the obtained p-values into a test at the gene level. Our method called AGGrEGATOr is based on a minP procedure that tests the significance of the minimum of a set of p-values. We use simulations to assess the capacity of AGGrEGATOr to correctly control for type-I error. The benefits of our approach in terms of statistical power and robustness to SNPs set characteristics are evaluated in a wide range of disease models by comparing it to previous methods. We also apply our method to detect gene pairs associated to rheumatoid arthritis (RA) on the GSE39428 dataset. We identify 13 potential gene-gene interactions and replicate one gene pair in the Wellcome Trust Case Control Consortium dataset at the level of 5%. We further test 15 gene pairs, previously reported as being statistically associated with RA or Crohn's disease (CD) or coronary artery disease (CAD), for replication in the Wellcome Trust Case Control Consortium dataset. We show that AGGrEGATOr is the only method able to successfully replicate seven gene pairs.
Assuntos
Epistasia Genética/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Modelos Estatísticos , Algoritmos , Artrite Reumatoide/genética , Artrite Reumatoide/patologia , Estudos de Casos e Controles , Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/patologia , Doença de Crohn/genética , Doença de Crohn/patologia , Estudo de Associação Genômica Ampla/métodos , Humanos , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Holder pasteurization (62.5°C, 30 min) ensures sanitary quality of donor's human milk but also denatures beneficial proteins. Understanding whether this further impacts the kinetics of peptide release during gastrointestinal digestion of human milk was the aim of the present paper. Mature raw (RHM) or pasteurized (PHM) human milk were digested (RHM, n = 2; PHM, n = 3) by an in vitro dynamic system (term stage). Label-free quantitative peptidomics was performed on milk and digesta (ten time points). Ascending hierarchical clustering was conducted on "Pasteurization × Digestion time" interaction coefficients. Preproteolysis occurred in human milk (159 unique peptides; RHM: 91, PHM: 151), mostly on ß-casein (88% of the endogenous peptides). The predicted cleavage number increased with pasteurization, potentially through plasmin activation (plasmin cleavages: RHM, 53; PHM, 76). During digestion, eight clusters resumed 1054 peptides from RHM and PHM, originating for 49% of them from ß-casein. For seven clusters (57% of peptides), the kinetics of peptide release differed between RHM and PHM. The parent protein was significantly linked to the clustering (p-value = 1.4 E-09), with ß-casein and lactoferrin associated to clusters in an opposite manner. Pasteurization impacted selectively gastric and intestinal kinetics of peptide release in term newborns, which may have further nutritional consequences.
Assuntos
Digestão , Proteínas do Leite/farmacocinética , Leite Humano , Pasteurização , Peptídeos/farmacocinética , Cromatografia Líquida , Humanos , Recém-Nascido , Proteólise , Espectrometria de Massas em TandemRESUMO
Staphylococcus aureus is a major human and animal pathogen, colonizing diverse ecological niches within its hosts. Predicting whether an isolate will infect a specific host and its subsequent clinical fate remains unknown. In this study, we investigated the S. aureus pangenome using a curated set of 356 strains, spanning a wide range of hosts, origins, and clinical display and antibiotic resistance profiles. We used genome-wide association study (GWAS) and random forest (RF) algorithms to discriminate strains based on their origins and clinical sources. Here, we show that the presence of sak and scn can discriminate strains based on their host specificity, while other genes such as mecA are often associated with virulent outcomes. Both GWAS and RF indicated the importance of intergenic regions (IGRs) and coding DNA sequence (CDS) but not sRNAs in forecasting an outcome. Additional transcriptomic analyses performed on the most prevalent clonal complex 8 (CC8) clonal types, in media mimicking nasal colonization or bacteremia, indicated three RNAs as potential RNA markers to forecast infection, followed by 30 others that could serve as infection severity predictors. Our report shows that genetic association and transcriptomics are complementary approaches that will be combined in a single analytical framework to improve our understanding of bacterial pathogenesis and ultimately identify potential predictive molecular markers. IMPORTANCE Predicting the outcome of bacterial colonization and infections, based on extensive genomic and transcriptomic data from a given pathogen, would be of substantial help for clinicians in treating and curing patients. In this report, genome-wide association studies and random forest algorithms have defined gene combinations that differentiate human from animal strains, colonization from diseases, and nonsevere from severe diseases, while it revealed the importance of IGRs and CDS, but not small RNAs (sRNAs), in anticipating an outcome. In addition, transcriptomic analyses performed on the most prevalent clonal types, in media mimicking either nasal colonization or bacteremia, revealed significant differences and therefore potent RNA markers. Overall, the use of both genomic and transcriptomic data in a single analytical framework can enhance our understanding of bacterial pathogenesis.
Assuntos
Bacteriemia , Infecções Estafilocócicas , Animais , Humanos , Staphylococcus aureus/genética , Estudo de Associação Genômica Ampla , Transcriptoma , Infecções Estafilocócicas/diagnóstico , RNA , Bacteriemia/microbiologia , Aprendizado de MáquinaRESUMO
Small regulatory RNAs (sRNAs) are key players in bacterial regulatory networks. Monitoring their expression inside living colonized or infected organisms is essential for identifying sRNA functions, but few studies have looked at sRNA expression during host infection with bacterial pathogens. Insufficient in vivo studies monitoring sRNA expression attest to the difficulties in collecting such data, we therefore developed a non-mammalian infection model using larval Galleria mellonella to analyze the roles of Staphylococcus aureus sRNAs during larval infection and to quickly determine possible sRNA involvement in staphylococcal virulence before proceeding to more complicated animal testing. We began by using the model to test infected larvae for immunohistochemical evidence of infection as well as host inflammatory responses over time. To monitor sRNA expression during infection, total RNAs were extracted from the larvae and invading bacteria at different time points. The expression profiles of the tested sRNAs were distinct and they fluctuated over time, with expression of both sprD and sprC increased during infection and associated with mortality, while rnaIII expression remained barely detectable over time. A strong correlation was observed between sprD expression and the mortality. To confirm these results, we used sRNA-knockout mutants to investigate sRNA involvement in Staphylococcus aureus pathogenesis, finding that the decrease in death rates is delayed when either sprD or sprC was lacking. These results demonstrate the relevance of this G. mellonella model for investigating the role of sRNAs as transcriptional regulators involved in staphylococcal virulence. This insect model provides a fast and easy method for monitoring sRNA (and mRNA) participation in S. aureus pathogenesis, and can also be used for other human bacterial pathogens.
Assuntos
Pequeno RNA não Traduzido , Infecções Estafilocócicas , Animais , Regulação Bacteriana da Expressão Gênica , Humanos , Larva , RNA Bacteriano , Staphylococcus aureus/genéticaRESUMO
The objective of the study was to investigate the effect of recovery time on walking capacity (WC) throughout repeated maximal walking bouts in symptomatic lower-extremity peripheral artery disease (PAD). The effect of recovery time on WC (maximal walking time) was determined in 21 participants with PAD in three experimental conditions [recovery time from 0.5 to 9.5 min + a self-selected recovery time (SSRT)]: 1) 11 repeated sequences of two treadmill walking bouts (TW-ISO); 2) a single sequence of seven treadmill walking bouts (TW-CONS); 3) a single sequence of seven outdoor walking bouts (OW-CONS). Exercise transcutaneous oxygen pressure changes were continuously recorded as an indirect measure of ischemia. An individual recovery time (IRT) beyond which WC did not substantially increased was determined in participants with a logarithmic fit. At the group level, mixed models showed a significant effect (P < 0.001) of recovery time on WC restoration. At the participant level, strong logarithmic relationships were found (median significant R2 ≥ 0.78). The median SSRT corresponded to a median work-to-rest ratio >1:1 (i.e., a lower recovery time in view of the corresponding previous walking time) and was related to unrecovered ischemia and a WC restoration level of <80%. A median work-to-rest ratio of ≤1:2 allowed full recovery of ischemia and full restoration of WC. The IRT ratio was between 1:1 and 1:2 and corresponded to the start of recovery from ischemia. Recovery time affects the restoration level of WC during repeated maximal walking bouts in symptomatic PAD. Meaningful variations in WC restoration were related to specific levels of work-to-rest ratios.NEW & NOTEWORTHY This study demonstrated that there is a significant and mostly logarithmic effect of recovery time on walking capacity in people with symptomatic PAD. This study revealed that a median work-to-rest ratio >1:1 leads to the resumption of walking with unrecovered ischemia and precludes the restoration of full walking capacity, whereas a work-to-rest ratio ≤1:2 allowed walking capacity to fully be restored.
Assuntos
Doença Arterial Periférica , Caminhada , Teste de Esforço , Tolerância ao Exercício , Humanos , Claudicação Intermitente , Extremidade InferiorRESUMO
PURPOSE: This study aimed to determine and compare the accuracy of different activity monitors in assessing intermittent outdoor walking in both healthy and clinical populations through the development and validation of processing methodologies. METHODS: In study 1, an automated algorithm was implemented and tested for the detection of short (≤1 min) walking and stopping bouts during prescribed walking protocols performed by healthy subjects in environments with low and high levels of obstruction. The following parameters obtained from activity monitors were tested, with different recording epochs0.1s/0.033s/1s/3s/10s and wearing locationsscapula/hip/wrist/ankle: GlobalSat DG100 (GS) and Qstarz BT-Q1000XT/-Q1000eX (QS) speed; ActiGraph wGT3X+ (AG) vector magnitude (VM) raw data, VM counts, and steps; and StepWatch3 (SW) steps. Furthermore, linear mixed models were developed to estimate walking speeds and distances from the monitors parameters. Study 2 validated the performance of the activity monitors and processing methodologies in a clinical population showing profile of intermittent walking due to functional limitations during outdoor walking sessions. RESULTS: In study 1, GS1s, scapula, QS1s, scapula/wrist speed, and AG0.033s, hip VM raw data provided the highest bout detection rates (>96.7%) and the lowest root mean square errors in speed (≤0.4 km·h-1) and distance (<18 m) estimation. Using SW3s, ankle steps, the root mean square error for walking/stopping duration estimation reached 13.6 min using proprietary software and 0.98 min using our algorithm (total recording duration, 282 min). In study 2, using AG0.033s, hip VM raw data, the bout detection rate (95% confidence interval) reached 100% (99%-100%), and the mean (SD) absolute percentage errors in speed and distance estimation were 9% (6.6%) and 12.5% (7.9%), respectively. CONCLUSIONS: GPS receivers and AG demonstrated high performance in assessing intermittent outdoor walking in both healthy and clinical populations.
Assuntos
Acelerometria/instrumentação , Monitores de Aptidão Física , Sistemas de Informação Geográfica/instrumentação , Caminhada/fisiologia , Idoso , Algoritmos , Humanos , Pessoa de Meia-Idade , Doença Arterial Periférica/fisiopatologia , Velocidade de Caminhada/fisiologia , Adulto JovemRESUMO
We introduce a new approach to hospital-acquired disease risk assessment from public health databases. In a spirit similar to actuarial risk theory, we define an adjustment coefficient that can quantify the risk associated with a hospital department, allowing comparisons of similar departments. The adjustment coefficient characterizes the tail of the distribution of the total patient length of stay in a department before the first disease event occurs. We show that this coefficient is the solution of a Lundberg-like equation, and we provide a nonparametric estimation procedure for this measure, based on a Cramér-Lundberg approximation for the tail of the distribution. Using simulations, we provide evidence of the robustness of the approximation to various individual risk models. In addition, we illustrate the relevance of this approach by evaluating the risk associated with a standard patient safety indicator in 20 hospitals of southeastern France.
Assuntos
Infecção Hospitalar/epidemiologia , Modelos Teóricos , Humanos , Medição de RiscoRESUMO
Holder pasteurization (62.5⯰C, 30â¯min) of human milk denatures beneficial proteins. The present paper aimed to assess whether this can affect the kinetics of peptide release during digestion at the preterm stage. Raw (RHM) or pasteurized (PHM) human milk were digested in triplicates using an in vitro dynamic system. Mass spectrometry and multivariate statistics were conducted. Pre-proteolysis occurred mostly on ß-casein, for which cumulative peptide abundance was significantly greater in PHM over 28% of the hydrolysed sequence. Eight clusters resumed the kinetics of peptide release during digestion, which differed on seven clusters (69% of the 1134 peptides). Clusters associated to the heat-denaturated proteins, lactoferrin and bile salt-stimulated lipase, presented different kinetics of release during digestion, unlike that for ß-casein. Some bioactive peptides from ß-casein presented significant different abundances between PHM and RHM before digestion (1-18, 185-211) or in during intestinal digestion (154-160, 161-166). Further physiological consequences should be investigated.
Assuntos
Leite Humano/química , Pasteurização , Ácidos e Sais Biliares/análise , Caseínas/análise , Análise por Conglomerados , Digestão , Temperatura Alta , Humanos , Concentração de Íons de Hidrogênio , Recém-Nascido Prematuro/crescimento & desenvolvimento , Lactoferrina/análise , Proteínas do Leite/análise , Peptídeos/análise , ProteóliseRESUMO
The Cochran-Armitage trend test (CA) has become a standard procedure for association testing in large-scale genome-wide association studies (GWAS). However, when the disease model is unknown, there is no consensus on the most powerful test to be used between CA, allelic, and genotypic tests. In this article, we tackle the question of whether CA is best suited to single-locus scanning in GWAS and propose a power comparison of CA against allelic and genotypic tests. Our approach relies on the evaluation of the Taylor decompositions of non-centrality parameters, thus allowing an analytical comparison of the power functions of the tests. Compared to simulation-based comparison, our approach offers the advantage of simultaneously accounting for the multidimensionality of the set of features involved in power functions. Although power for CA depends on the sample size, the case-to-control ratio and the minor allelic frequency (MAF), our results first show that it is largely influenced by the mode of inheritance and a deviation from Hardy-Weinberg Equilibrium (HWE). Furthermore, when compared to other tests, CA is shown to be the most powerful test under a multiplicative disease model or when the single-nucleotide polymorphism largely deviates from HWE. In all other situations, CA lacks in power and differences can be substantial, especially for the recessive mode of inheritance. Finally, our results are illustrated by the comparison of the performances of the statistics in two genome scans.
Assuntos
Alelos , Estudo de Associação Genômica Ampla , Genótipo , Algoritmos , Pesquisa Biomédica/estatística & dados numéricos , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Funções Verossimilhança , Modelos LinearesRESUMO
This paper studies the possibility to convey information using tactile stimulation on fingertips. We designed and evaluated three tactile alphabets which are rendered by stretching the skin of the index's fingertip: (1) a Morse-like alphabet, (2) a symbolic alphabet using two successive dashes, and (3) a display of Roman letters based on the Unistrokes alphabet. All three alphabets (26 letters each) were evaluated through a user study in terms of recognition rate, intuitiveness, and learnability. Participants were able to perceive and recognize the letters with very good results (80-97 percent recognition rates). Taken together, our results pave the way to novel kinds of communication using tactile modality.
Assuntos
Dedos/fisiologia , Comunicação não Verbal/fisiologia , Fenômenos Fisiológicos da Pele , Percepção do Tato/fisiologia , Interface Usuário-Computador , Adulto , Feminino , Humanos , Masculino , Adulto JovemRESUMO
Humans have invested several genes in DNA repair and fidelity replication. To account for the disparity between the rarity of mutations in normal cells and the large number of mutations present in cancer, an hypothesis is that cancer cells must exhibit a mutator phenotype (genomic instability) during tumor progression, with the initiation of abnormal mutation rates caused by the loss of mismatch repair. In this study we introduce a stochastic model of mutation in tumor cells with the aim of estimating the amount of genomic instability due to the alteration of DNA repair genes. Our approach took into account the difficulties generated by sampling within tumoral clones and the fact that these clones must be difficult to isolate. We provide corrections to two classical statistics to obtain unbiased estimators of the raised mutation rate, and we show that large statistical errors may be associated with such estimators. The power of these new statistics to reject genomic instability is assessed and proved to increase with the intensity of mutation rates. In addition, we show that genomic instability cannot be detected unless the raised mutation rates exceed the normal rates by a factor of at least 1000.
Assuntos
Instabilidade Genômica , Mutação , Biologia Computacional , Reparo do DNA/genética , Humanos , Modelos Genéticos , Nucleotídeos/genética , Polimorfismo GenéticoRESUMO
BACKGROUND: The Differential Adhesion Hypothesis (DAH) is a theory of the organization of cells within a tissue which has been validated by several biological experiments and tested against several alternative computational models. RESULTS: In this study, a statistical approach was developed for the estimation of the strength of adhesion, incorporating earlier discrete lattice models into a continuous marked point process framework. This framework allows to describe an ergodic Markov Chain Monte Carlo algorithm that can simulate the model and reproduce empirical biological patterns. The estimation procedure, based on a pseudo-likelihood approximation, is validated with simulations, and a brief application to medulloblastoma stained by beta-catenin markers is given. CONCLUSION: Our model includes the strength of cell-cell adhesion as a statistical parameter. The estimation procedure for this parameter is consistent with experimental data and would be useful for high-throughput cancer studies.
Assuntos
Adesão Celular , Modelos Estatísticos , Humanos , Meduloblastoma/patologia , Modelos BiológicosRESUMO
Asymptotic tests are commonly used for comparing two binomial proportions when the sample size is sufficiently large. However, there is no consensus on the most powerful test. In this paper, we clarify this issue by comparing the power functions of three popular asymptotic tests: the Pearson's χ2 test, the likelihood-ratio test and the odds-ratio based test. Considering Taylor decompositions under local alternatives, the comparisons lead to recommendations on which test to use in view of both the experimental design and the nature of the investigated signal. We show that when the design is balanced between the two binomials, the three tests are equivalent in terms of power. However, when the design is unbalanced, differences in power can be substantial and the choice of the most powerful test also depends on the value of the parameters of the two compared binomials. We further investigated situations where the two binomials are not compared directly but through tag binomials. In these cases of indirect association, we show that the differences in power between the three tests are enhanced with decreasing values of the parameters of the tag binomials. Our results are illustrated in the context of genetic epidemiology where the analysis of genome-wide association studies provides insights regarding the low power for detecting rare variants.
Assuntos
Bioestatística/métodos , Modelos Estatísticos , Estudos de Casos e Controles , Simulação por Computador , Frequência do Gene , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Funções Verossimilhança , Modelos Logísticos , Epidemiologia Molecular/estatística & dados numéricos , Razão de Chances , Polimorfismo de Nucleotídeo Único , Tamanho da AmostraRESUMO
The aggregation of proteins or peptides in amyloid fibrils is associated with a number of clinical disorders, including Alzheimer's, Huntington's and prion diseases, medullary thyroid cancer, renal and cardiac amyloidosis. Despite extensive studies, the molecular mechanisms underlying the initiation of fibril formation remain largely unknown. Several lines of evidence revealed that short amino-acid segments (hot spots), located in amyloid precursor proteins act as seeds for fibril elongation. Therefore, hot spots are potential targets for diagnostic/therapeutic applications, and a current challenge in bioinformatics is the development of methods to accurately predict hot spots from protein sequences. In this paper, we combined existing methods into a meta-predictor for hot spots prediction, called MetAmyl for METapredictor for AMYLoid proteins. MetAmyl is based on a logistic regression model that aims at weighting predictions from a set of popular algorithms, statistically selected as being the most informative and complementary predictors. We evaluated the performances of MetAmyl through a large scale comparative study based on three independent datasets and thus demonstrated its ability to differentiate between amyloidogenic and non-amyloidogenic polypeptides. Compared to 9 other methods, MetAmyl provides significant improvement in prediction on studied datasets. We further show that MetAmyl is efficient to highlight the effect of point mutations involved in human amyloidosis, so we suggest this program should be a useful complementary tool for the diagnosis of these diseases.
Assuntos
Proteínas Amiloidogênicas/metabolismo , Algoritmos , Aminoácidos/genética , Aminoácidos/metabolismo , Proteínas Amiloidogênicas/genética , Amiloidose/diagnóstico , Amiloidose/genética , Humanos , Modelos Moleculares , Peptídeos/genética , Peptídeos/metabolismo , Mutação Puntual/genéticaRESUMO
Genome-wide association studies have identified a large number of single-nucleotide polymorphisms (SNPs) that individually predispose to diseases. However, many genetic risk factors remain unaccounted for. Proteins coded by genes interact in the cell, and it is most likely that certain variants mainly affect the phenotype in combination with other variants, termed epistasis. An exhaustive search for epistatic effects is computationally demanding, as several billions of SNP pairs exist for typical genotyping chips. In this study, the experimental knowledge on biological networks is used to narrow the search for two-locus epistasis. We provide evidence that this approach is computationally feasible and statistically powerful. By applying this method to the Wellcome Trust Case-Control Consortium data sets, we report four significant cases of epistasis between unlinked loci, in susceptibility to Crohn's disease, bipolar disorder, hypertension and rheumatoid arthritis.