Pesquisa | Secretaria de Estado da Saúde

1.

Finite mixtures of matrix variate Poisson-log normal distributions for three-way count data.

Silva, Anjali; Qin, Xiaoke; Rothstein, Steven J; McNicholas, Paul D; Subedi, Sanjeena.

Bioinformatics ; 39(5)2023 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-37018147

RESUMO

MOTIVATION: Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks. RESULTS: In this work, a mixture of matrix variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. We propose three different frameworks for parameter estimation: a Markov chain Monte Carlo-based approach, a variational Gaussian approximation-based approach, and a hybrid approach. Various information criteria are used for model selection. The models are applied to both real and simulated data, and we demonstrate that the proposed approaches can recover the underlying cluster structure in both cases. In simulation studies where the true model parameters are known, our proposed approach shows good parameter recovery. AVAILABILITY AND IMPLEMENTATION: The GitHub R package for this work is available at https://github.com/anjalisilva/mixMVPLN and is released under the open source MIT license.

Assuntos

Transcriptoma , Distribuição Normal , Simulação por Computador , Distribuições Estatísticas , Análise de Sequência de RNA

2.

A multivariate Poisson-log normal mixture model for clustering transcriptome sequencing data.

Silva, Anjali; Rothstein, Steven J; McNicholas, Paul D; Subedi, Sanjeena.

BMC Bioinformatics ; 20(1): 394, 2019 Jul 16.

Artigo em Inglês | MEDLINE | ID: mdl-31311497

RESUMO

BACKGROUND: High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. RESULTS: A mixture of multivariate Poisson-log normal (MPLN) model is developed for clustering of high-throughput transcriptome sequencing data. Parameter estimation is carried out using a Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, and information criteria are used for model selection. CONCLUSIONS: The mixture of MPLN model is able to fit a wide range of correlation and overdispersion situations, and is suited for modeling multivariate count data from RNA sequencing studies. All scripts used for implementing the method can be found at https://github.com/anjalisilva/MPLNClust .

Assuntos

Algoritmos , RNA/química , Análise por Conglomerados , Sequenciamento de Nucleotídeos em Larga Escala , Cadeias de Markov , Modelos Teóricos , Método de Monte Carlo , RNA/genética , RNA/metabolismo , Análise de Sequência de RNA , Interface Usuário-Computador

3.

Two-way learning with one-way supervision for gene expression data.

Wong, Monica H T; Mutch, David M; McNicholas, Paul D.

BMC Bioinformatics ; 18(1): 150, 2017 Mar 04.

Artigo em Inglês | MEDLINE | ID: mdl-28257645

RESUMO

BACKGROUND: A family of parsimonious Gaussian mixture models for the biclustering of gene expression data is introduced. Biclustering is accommodated by adopting a mixture of factor analyzers model with a binary, row-stochastic factor loadings matrix. This particular form of factor loadings matrix results in a block-diagonal covariance matrix, which is a useful property in gene expression analyses, specifically in biomarker discovery scenarios where blood can potentially act as a surrogate tissue for other less accessible tissues. Prior knowledge of the factor loadings matrix is useful in this application and is reflected in the one-way supervised nature of the algorithm. Additionally, the factor loadings matrix can be assumed to be constant across all components because of the relationship desired between the various types of tissue samples. Parameter estimates are obtained through a variant of the expectation-maximization algorithm and the best-fitting model is selected using the Bayesian information criterion. The family of models is demonstrated using simulated data and two real microarray data sets. The first real data set is from a rat study that investigated the influence of diabetes on gene expression in different tissues. The second real data set is from a human transcriptomics study that focused on blood and immune tissues. The microarray data sets illustrate the biclustering family's performance in biomarker discovery involving peripheral blood as surrogate biopsy material. RESULTS: The simulation studies indicate that the algorithm identifies the correct biclusters, most optimally when the number of observation clusters is known. Moreover, the biclustering algorithm identified biclusters comprised of biologically meaningful data related to insulin resistance and immune function in the rat and human real data sets, respectively. CONCLUSIONS: Initial results using real data show that this biclustering technique provides a novel approach for biomarker discovery by enabling blood to be used as a surrogate for hard-to-obtain tissues.

Assuntos

Bases de Dados Genéticas , Expressão Gênica , Aprendizado de Máquina Supervisionado , Transcriptoma , Animais , Teorema de Bayes , Biomarcadores/sangue , Análise por Conglomerados , Diabetes Mellitus Experimental/genética , Humanos , Masculino , Modelos Teóricos , Ratos , Ratos Zucker

4.

Parsimonious mixtures of multivariate contaminated normal distributions.

Punzo, Antonio; McNicholas, Paul D.

Biom J ; 58(6): 1506-1537, 2016 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-27510372

RESUMO

A mixture of multivariate contaminated normal distributions is developed for model-based clustering. In addition to the parameters of the classical normal mixture, our contaminated mixture has, for each cluster, a parameter controlling the proportion of mild outliers and one specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach. Parsimony is introduced via eigen-decomposition of the component covariance matrices, and sufficient conditions for the identifiability of all the members of the resulting family are provided. An expectation-conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. Using a large-scale simulation study, the behavior of the proposed approach is investigated and comparison with well-established finite mixtures is provided. The performance of this novel family of models is also illustrated on artificial and real data.

Assuntos

Algoritmos , Modelos Estatísticos , Análise por Conglomerados , Distribuição Normal

5.

Mixtures of multivariate power exponential distributions.

Dang, Utkarsh J; Browne, Ryan P; McNicholas, Paul D.

Biometrics ; 71(4): 1081-9, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-26134429

RESUMO

An expanded family of mixtures of multivariate power exponential distributions is introduced. While fitting heavy-tails and skewness have received much attention in the model-based clustering literature recently, we investigate the use of a distribution that can deal with both varying tail-weight and peakedness of data. A family of parsimonious models is proposed using an eigen-decomposition of the scale matrix. A generalized expectation-maximization algorithm is presented that combines convex optimization via a minorization-maximization approach and optimization based on accelerated line search algorithms on the Stiefel manifold. Lastly, the utility of this family of models is illustrated using both toy and benchmark data.

Assuntos

Biometria/métodos , Modelos Estatísticos , Análise Multivariada , Algoritmos , Animais , Teorema de Bayes , Análise por Conglomerados , Simulação por Computador , Bases de Dados Factuais/estatística & dados numéricos , Feminino , Humanos , Funções Verossimilhança , Masculino , Distribuição Normal , Distribuições Estatísticas

6.

Nitrogen limitation and high density responses in rice suggest a role for ethylene under high density stress.

Misyura, Maksym; Guevara, David; Subedi, Sanjeena; Hudson, Darryl; McNicholas, Paul D; Colasanti, Joseph; Rothstein, Steven J.

BMC Genomics ; 15: 681, 2014 Aug 13.

Artigo em Inglês | MEDLINE | ID: mdl-25128291

RESUMO

BACKGROUND: High density stress, also known as intraspecies competition, causes significant yield losses in a wide variety of crop plants. At the same time, increases in density tolerance through selective breeding and the concomitant ability to plant crops at a higher population density has been one of the most important factors in the development of high yielding modern cultivars. RESULTS: Physiological changes underlying high density stress were examined in Oryza sativa plants over the course of a life cycle by assessing differences in gene expression and metabolism. Moreover, the nitrogen limitation was examined in parallel with high density stress to gain a better understanding of physiological responses specific to high density stress. While both nitrogen limitation and high density resulted in decreased shoot fresh weight, tiller number, plant height and chlorophyll content, high density stress alone had a greater impact on physiological factors. Decreases in aspartate and glutamate concentration were found in plants grown under both stress conditions; however, high density stress had a more significant effect on the concentration of these amino acids. Global transcriptome analysis revealed a large proportion of genes with altered expression in response to both stresses. The presence of ethylene-associated genes in a majority of density responsive genes was investigated further. Expression of ethylene biosynthesis genes ACC synthase 1, ACC synthase 2 and ACC oxidase 7 were found to be upregulated in plants under high density stress. Plants at high density were also found to up regulate ethylene-associated genes and senescence genes, while cytokinin response and biosynthesis genes were down regulated, consistent with higher ethylene production. CONCLUSIONS: High density stress has similar but greater impact on plant growth and development compared to nitrogen limitation. Global transcriptome changes implicate ethylene as a volatile signal used to communicate proximity in under dense population growth condition and suggest a role for phytohormones in high density stress response in rice plants.

Assuntos

Etilenos/metabolismo , Perfilação da Expressão Gênica , Metabolômica , Nitrogênio/metabolismo , Oryza/genética , Oryza/metabolismo , Estresse Fisiológico , Ácido Aspártico/metabolismo , Genes de Plantas/genética , Ácido Glutâmico/metabolismo , Oryza/crescimento & desenvolvimento , Oryza/fisiologia

7.

Metabolic and co-expression network-based analyses associated with nitrate response in rice.

Coneva, Viktoriya; Simopoulos, Caitlin; Casaretto, José A; El-Kereamy, Ashraf; Guevara, David R; Cohn, Jonathan; Zhu, Tong; Guo, Lining; Alexander, Danny C; Bi, Yong-Mei; McNicholas, Paul D; Rothstein, Steven J.

BMC Genomics ; 15: 1056, 2014 Dec 03.

Artigo em Inglês | MEDLINE | ID: mdl-25471115

RESUMO

BACKGROUND: Understanding gene expression and metabolic re-programming that occur in response to limiting nitrogen (N) conditions in crop plants is crucial for the ongoing progress towards the development of varieties with improved nitrogen use efficiency (NUE). To unravel new details on the molecular and metabolic responses to N availability in a major food crop, we conducted analyses on a weighted gene co-expression network and metabolic profile data obtained from leaves and roots of rice plants adapted to sufficient and limiting N as well as after shifting them to limiting (reduction) and sufficient (induction) N conditions. RESULTS: A gene co-expression network representing clusters of rice genes with similar expression patterns across four nitrogen conditions and two tissue types was generated. The resulting 18 clusters were analyzed for enrichment of significant gene ontology (GO) terms. Four clusters exhibited significant correlation with limiting and reducing nitrate treatments. Among the identified enriched GO terms, those related to nucleoside/nucleotide, purine and ATP binding, defense response, sugar/carbohydrate binding, protein kinase activities, cell-death and cell wall enzymatic activity are enriched. Although a subset of functional categories are more broadly associated with the response of rice organs to limiting N and N reduction, our analyses suggest that N reduction elicits a response distinguishable from that to adaptation to limiting N, particularly in leaves. This observation is further supported by metabolic profiling which shows that several compounds in leaves change proportionally to the nitrate level (i.e. higher in sufficient N vs. limiting N) and respond with even higher levels when the nitrate level is reduced. Notably, these compounds are directly involved in N assimilation, transport, and storage (glutamine, asparagine, glutamate and allantoin) and extend to most amino acids. Based on these data, we hypothesize that plants respond by rapidly mobilizing stored vacuolar nitrate when N deficit is perceived, and that the response likely involves phosphorylation signal cascades and transcriptional regulation. CONCLUSIONS: The co-expression network analysis and metabolic profiling performed in rice pinpoint the relevance of signal transduction components and regulation of N mobilization in response to limiting N conditions and deepen our understanding of N responses and N use in crops.

Assuntos

Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Redes e Vias Metabólicas , Nitratos/metabolismo , Oryza/genética , Oryza/metabolismo , Análise por Conglomerados , Biologia Computacional , Epigênese Genética , Perfilação da Expressão Gênica , Metaboloma , Metabolômica , Anotação de Sequência Molecular , Família Multigênica , Especificidade de Órgãos , Folhas de Planta/genética , Folhas de Planta/metabolismo , Raízes de Plantas/genética , Raízes de Plantas/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo

8.

The association of physical activity duration and intensity on emotional intelligence in 10-13 year-old Children.

Gabour, Marie C; You, Tongjian; Fleming, Richard; McNicholas, Paul D; Gona, Philimon N.

Sports Med Health Sci ; 6(4): 331-337, 2024 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-39309461

RESUMO

Previous studies have shown that Physical Activity (PA) has a positive association with emotional health and intelligence in adolescents but none have focused on the relationship of PA duration and intensity on Emotional Intelligence (EI). The purpose of this study was to cross-sectionally assess the association of PA measures on overall EI and its domains in a cohort of 2 029 adolescents aged 10-13 years of age in the National Longitudinal Survey for Children and Youth (NLSCY) from Canada. Multivariable linear regression analysis of EI was adjusted for age, sex, annual household income, and health status. One-way analysis of variance (ANOVA) was used to relate PA duration measured in minutes, frequency, and intensity categories with continuous GEI scores and also the corresponding scores for domains of GEI. The mean GEI scores were (28.3 â± â6.6) for 0-30 âminute (min) PA duration, (30.0 â± â6.5) for 30 to â< â60 âmin, (30.8 â± â6.7) for 60-120 âmin, and (30.1 â± â6.5) for ≥ 121 âmin. There was a statistically significant linear trend across PA duration categories, p â= â0.000 4. Post-hoc pairwise comparison revealed that compared to the referent category (< 30 âmin âPA category) was statistically significantly lower GEI than each of the other two PA categories (30-59 âmin; and 60-120 âmin), both p-values < 0.01. Meeting World Health Organization (WHO) guidelines for duration and vigorous intensity were positively associated with the higher overall EI and its domains except for Stress Management.

9.

Genome-wide expression profiling of maize in response to individual and combined water and nitrogen stresses.

Humbert, Sabrina; Subedi, Sanjeena; Cohn, Jonathan; Zeng, Bin; Bi, Yong-Mei; Chen, Xi; Zhu, Tong; McNicholas, Paul D; Rothstein, Steven J.

BMC Genomics ; 14: 3, 2013 Jan 16.

Artigo em Inglês | MEDLINE | ID: mdl-23324127

RESUMO

BACKGROUND: Water and nitrogen are two of the most critical inputs required to achieve the high yield potential of modern corn varieties. Under most agricultural settings however they are often scarce and costly. Fortunately, tremendous progress has been made in the past decades in terms of modeling to assist growers in the decision making process and many tools are now available to achieve more sustainable practices both environmentally and economically. Nevertheless large gaps remain between our empirical knowledge of the physiological changes observed in the field in response to nitrogen and water stresses, and our limited understanding of the molecular processes leading to those changes. RESULTS: This work examines in particular the impact of simultaneous stresses on the transcriptome. In a greenhouse setting, corn plants were grown under tightly controlled nitrogen and water conditions, allowing sampling of various tissues and stress combinations. A microarray profiling experiment was performed using this material and showed that the concomitant presence of nitrogen and water limitation affects gene expression to an extent much larger than anticipated. A clustering analysis also revealed how the interaction between the two stresses shapes the patterns of gene expression over various levels of water stresses and recovery. CONCLUSIONS: Overall, this study suggests that the molecular signature of a specific combination of stresses on the transcriptome might be as unique as the impact of individual stresses, and hence underlines the difficulty to extrapolate conclusions obtained from the study of individual stress responses to more complex settings.

Assuntos

Perfilação da Expressão Gênica , Genômica , Nitrogênio/farmacologia , Estresse Fisiológico/efeitos dos fármacos , Água/farmacologia , Zea mays/genética , Zea mays/fisiologia , Biotecnologia , Interações Medicamentosas , Ambiente Controlado , Análise de Sequência com Séries de Oligonucleotídeos , Estresse Fisiológico/genética , Fatores de Tempo , Transcrição Gênica/efeitos dos fármacos , Transcrição Gênica/genética , Zea mays/efeitos dos fármacos

10.

Promzea: a pipeline for discovery of co-regulatory motifs in maize and other plant species and its application to the anthocyanin and phlobaphene biosynthetic pathways and the Maize Development Atlas.

Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N.

BMC Plant Biol ; 13: 42, 2013 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-23497159

RESUMO

BACKGROUND: The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. RESULTS: A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. CONCLUSIONS: An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.

Assuntos

Antocianinas/biossíntese , Vias Biossintéticas , Biologia Computacional/métodos , Flavonoides/biossíntese , Proteínas de Plantas/genética , Regiões Promotoras Genéticas , Software , Zea mays/genética , Algoritmos , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/metabolismo , Sequência de Bases , Biologia Computacional/instrumentação , Dados de Sequência Molecular , Proteínas de Plantas/metabolismo , Zea mays/crescimento & desenvolvimento , Zea mays/metabolismo

11.

VLF: An R package for the analysis of very low frequency variants in DNA sequences.

Phillips, Jarrett D; Athey, Taryn B T; McNicholas, Paul D; Hanner, Robert H.

Biodivers Data J ; 11: e96480, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38327328

RESUMO

Here, we introduce VLF, an R package to determine the distribution of very low frequency variants (VLFs) in nucleotide and amino acid sequences for the analysis of errors in DNA sequence records. The package allows users to assess VLFs in aligned and trimmed protein-coding sequences by automatically calculating the frequency of nucleotides or amino acids in each sequence position and outputting those that occur under a user-specified frequency (default of p = 0.001). These results can then be used to explore fundamental population genetic and phylogeographic patterns, mechanisms and processes at the microevolutionary level, such as nucleotide and amino acid sequence conservation. Our package extends earlier work pertaining to an implementation of VLF analysis in Microsoft Excel, which was found to be both computationally slow and error prone. We compare those results to our own herein. Results between the two implementations are found to be highly consistent for a large DNA barcode dataset of bird species. Differences in results are readily explained by both manual human error and inadequate Linnean taxonomy (specifically, species synonymy). Here, VLF is also applied to a subset of avian barcodes to assess the extent of biological artifacts at the species level for Canada goose (Branta canadensis), as well as within a large dataset of DNA barcodes for fishes of forensic and regulatory importance. The novelty of VLF and its benefit over the previous implementation include its high level of automation, speed, scalability and ease-of-use, each desirable characteristics which will be extremely valuable as more sequence data are rapidly accumulated in popular reference databases, such as BOLD and GenBank.

12.

Trajectories of Symptom Severity in Children with Autism: Variability and Turning Points through the Transition to School.

Georgiades, Stelios; Tait, Peter A; McNicholas, Paul D; Duku, Eric; Zwaigenbaum, Lonnie; Smith, Isabel M; Bennett, Teresa; Elsabbagh, Mayada; Kerns, Connor M; Mirenda, Pat; Ungar, Wendy J; Vaillancourt, Tracy; Volden, Joanne; Waddell, Charlotte; Zaidman-Zait, Anat; Gentles, Stephen; Szatmari, Peter.

J Autism Dev Disord ; 52(1): 392-401, 2022 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-33704613

RESUMO

This study examined the trajectories of autistic symptom severity in an inception cohort of 187 children with ASD assessed across four time points from diagnosis to age 10. Trajectory groups were derived using multivariate cluster analysis. A two trajectory/cluster solution was selected. Change in trajectory slopes revealed a turning point marked by plateauing in symptom reduction during the period of transition to school (age 6) for one of the two trajectories. Trajectories were labelled: Continuously Improving (27%) and Improving then Plateauing (73% of sample). Children in the two trajectories differed in levels of symptom severity, language, cognitive, and adaptive functioning skills. Study findings can inform the development of more personalized services for children with ASD transitioning into the school system.

Assuntos

Transtorno do Espectro Autista , Transtorno Autístico , Transtorno do Espectro Autista/diagnóstico , Transtorno Autístico/diagnóstico , Criança , Humanos , Idioma , Análise Multivariada , Instituições Acadêmicas

13.

Model-based clustering of microarray expression data via latent Gaussian mixture models.

McNicholas, Paul D; Murphy, Thomas Brendan.

Bioinformatics ; 26(21): 2705-12, 2010 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-20802251

RESUMO

MOTIVATION: In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation-maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. RESULTS: The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data. AVAILABILITY: The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info

Assuntos

Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Análise por Conglomerados , Simulação por Computador , Distribuição Normal , Reconhecimento Automatizado de Padrão/métodos

14.

Identification of five important genes to predict glioblastoma subtypes.

Tang, Yang; Qazi, Maleeha A; Brown, Kevin R; Mikolajewicz, Nicholas; Moffat, Jason; Singh, Sheila K; McNicholas, Paul D.

Neurooncol Adv ; 3(1): vdab144, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34765972

RESUMO

BACKGROUND: Glioblastoma (GBM), the most common and aggressive primary brain tumour in adults, has been classified into three subtypes: classical, mesenchymal, and proneural. While the original classification relied on an 840 gene-set, further clarification on true GBM subtypes uses a 150-gene signature to accurately classify GBM into the three subtypes. We hypothesized whether a machine learning approach could be used to identify a smaller gene-set to accurately predict GBM subtype. METHODS: Using a supervised machine learning approach, extreme gradient boosting (XGBoost), we developed a classifier to predict the three subtypes of glioblastoma (GBM): classical, mesenchymal, and proneural. We tested the classifier on in-house GBM tissue, cell lines, and xenograft samples to predict their subtype. RESULTS: We identified the five most important genes for characterizing the three subtypes based on genes that often exhibited high Importance Scores in our XGBoost analyses. On average, this approach achieved 80.12% accuracy in predicting these three subtypes of GBM. Furthermore, we applied our five-gene classifier to successfully predict the subtype of GBM samples at our centre. CONCLUSION: Our 5-gene set classifier is the smallest classifier to date that can predict GBM subtypes with high accuracy, which could facilitate the future development of a five-gene subtype diagnostic biomarker for routine assays in GBM samples.

15.

Do Different Ascertainment Techniques Identify the Same Individuals as Sarcopenic in the Canadian Longitudinal Study on Aging?

Mayhew, Alexandra J; Phillips, Stuart M; Sohel, Nazmul; Thabane, Lehana; McNicholas, Paul D; de Souza, Russell J; Parise, Gianni; Raina, Parminder.

J Am Geriatr Soc ; 69(1): 164-172, 2021 01.

Artigo em Inglês | MEDLINE | ID: mdl-32936468

RESUMO

BACKGROUND/OBJECTIVES: Sarcopenia is associated with poor health outcomes such as disability, institutionalization, and mortality. Efforts to manage sarcopenia clinically have been hindered by challenges in determining how to ascertain sarcopenia status correctly. The objective of this project was to assess the agreement between the different methods of ascertaining sarcopenia recommended by expert groups. DESIGN: Cross-sectional study of baseline data (2011-2015) from the Canadian Longitudinal Study on Aging. SETTING: Population-based multicenter study of community-dwelling participants. PARTICIPANTS: Eligible participants (n = 12,646) aged 65 to 85 living within 25 to 50 km of 11 data collection sites in Canada. The analyses included 10,820 participants with the data required to diagnose sarcopenia. MEASUREMENTS: Sarcopenia was operationalized as appendicular lean mass (ALM), ALM and grip strength, ALM and gait speed, and grip strength and gait speed. Within each combination, ALM was adjusted for height squared, weight, body mass index, and the residual of regressing lean mass on height and fat mass. The lowest 20th sex-specific percentile values were used as the cutoffs for low ALM. Low grip strength cutoffs of 35.5 kg for men and 20 kg for women and a gait speed cutoff of .8 m/s were used. RESULTS: The mean age was 73.0 ± 5.6 years, and 51.9% of the sample was male. The agreement (Cohen's κ) between the different combinations of variables used to ascertain sarcopenia status was below .50. Agreement for the different lean mass adjustment techniques ranged from .04 to .76. CONCLUSION: The combination of variables used to ascertain sarcopenia and many of the ALM adjustment techniques have insufficient agreement to be considered equivalent. This has important clinical implications for the management of sarcopenia because treatments may differ based on how sarcopenia is identified. To improve the clinical utility of sarcopenia, a unified definition of sarcopenia is required.

Assuntos

Envelhecimento , Pacientes/estatística & dados numéricos , Sarcopenia/diagnóstico , Absorciometria de Fóton , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Índice de Massa Corporal , Canadá , Estudos Transversais , Feminino , Força da Mão/fisiologia , Humanos , Vida Independente , Estudos Longitudinais , Masculino , Debilidade Muscular/fisiopatologia , Velocidade de Caminhada/fisiologia

16.

Flexible High-Dimensional Unsupervised Learning with Missing Data.

Wei, Yuhong; Tang, Yang; McNicholas, Paul D.

IEEE Trans Pattern Anal Mach Intell ; 42(3): 610-621, 2020 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-30530313

RESUMO

The mixture of factor analyzers (MFA) model is a famous mixture model-based approach for unsupervised learning with high-dimensional data. It can be useful, inter alia, in situations where the data dimensionality far exceeds the number of observations. In recent years, the MFA model has been extended to non-Gaussian mixtures to account for clusters with heavier tail weight and/or asymmetry. The generalized hyperbolic factor analyzers (MGHFA) model is one such extension, which leads to a flexible modelling paradigm that accounts for both heavier tail weight and cluster asymmetry. In many practical applications, the occurrence of missing values often complicates data analyses. A generalization of the MGHFA is presented to accommodate missing values. Under a missing-at-random mechanism, we develop a computationally efficient alternating expectation conditional maximization algorithm for parameter estimation of the MGHFA model with different patterns of missing values. The imputation of missing values under an incomplete-data structure of MGHFA is also investigated. The performance of our proposed methodology is illustrated through the analysis of simulated and real data.

17.

The impact of different diagnostic criteria on the association of sarcopenia with injurious falls in the CLSA.

Mayhew, Alexandra J; Phillips, Stuart M; Sohel, Nazmul; Thabane, Lehana; McNicholas, Paul D; de Souza, Russell J; Parise, Gianni; Raina, Parminder.

J Cachexia Sarcopenia Muscle ; 11(6): 1603-1613, 2020 12.

Artigo em Inglês | MEDLINE | ID: mdl-32940016

RESUMO

BACKGROUND: Sarcopenia definitions recommend different combinations of variables (lean mass, strength, and physical function) and different methods of adjusting lean mass. The purpose of this paper was to address the gaps in the literature regarding how differences in the operationalization of sarcopenia impact the association between sarcopenia and injurious falls. METHODS: Participants included 9936 individuals from the Canadian Longitudinal Study on Aging aged ≥65 years at baseline (2012-2015), with complete data for sarcopenia-related variables, injurious falls, and covariates. Sarcopenia was defined using all combinations of muscle variables (lean mass, grip strength, chair rise test, and gait speed) and methods of adjusting lean mass (height2 , weight, body mass index (BMI), and regressing on height and fat mass) recommended by the expert group sarcopenia definitions. Multiple cut off values for the measures were explored. The association between sarcopenia and injurious falls (0, 1, or 2+ falls) measured 18 months after baseline data collection were assessed using proportional odds regression models. RESULTS: In men (n = 5162, 72.9 ± 5.6 years), the odds of having a higher level of injurious falls was between 1.43 and 2.14 greater when sarcopenia was defined as (i) lean mass adjusted for weight only; (ii) grip strength (<30 or <26 kg) only; (iii) lean mass adjusted for weight and grip strength (<30 or <26 kg); (iv) lean mass adjusted for BMI and grip strength (<26 kg); and (v) lean mass adjusted using the regression technique and grip strength (<30 or <26 kg). In women (n = 4774, 72.8 ± 5.6 years), only the combination of lean mass adjusted using regression with gait speed (<0.8 m/s) was associated with a significantly higher odds (1.46, 95% confidence interval: 1.01-2.10, P = 0.04) of having a higher level of injurious falls. CONCLUSIONS: Sarcopenia definitions based on different combinations of muscle variables and methods of adjusting lean mass are not equally associated with injurious falls. In men, definitions including grip strength but not gait speed or the chair rise test, and adjusting lean mass for weight, BMI, or using the residual technique but not height2 , tended to be associated with injurious falls. In women, sarcopenia was generally not associated with injurious falls regardless of the definition used.

Assuntos

Sarcopenia , Acidentes por Quedas , Idoso , Envelhecimento , Canadá , Feminino , Força da Mão , Humanos , Estudos Longitudinais , Masculino , Sarcopenia/diagnóstico , Sarcopenia/epidemiologia

18.

Review and implementation of cure models based on first hitting times for Wiener processes.

Balka, Jeremy; Desmond, Anthony F; McNicholas, Paul D.

Lifetime Data Anal ; 15(2): 147-76, 2009 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-19123058

RESUMO

The development of models and methods for cure rate estimation has recently burgeoned into an important subfield of survival analysis. Much of the literature focuses on the standard mixture model. Recently, process-based models have been suggested. We focus on several models based on first passage times for Wiener processes. Whitmore and others have studied these models in a variety of contexts. Lee and Whitmore (Stat Sci 21(4):501-513, 2006) give a comprehensive review of a variety of first hitting time models and briefly discuss their potential as cure rate models. In this paper, we study the Wiener process with negative drift as a possible cure rate model but the resulting defective inverse Gaussian model is found to provide a poor fit in some cases. Several possible modifications are then suggested, which improve the defective inverse Gaussian. These modifications include: the inverse Gaussian cure rate mixture model; a mixture of two inverse Gaussian models; incorporation of heterogeneity in the drift parameter; and the addition of a second absorbing barrier to the Wiener process, representing an immunity threshold. This class of process-based models is a useful alternative to the standard model and provides an improved fit compared to the standard model when applied to many of the datasets that we have studied. Implementation of this class of models is facilitated using expectation-maximization (EM) algorithms and variants thereof, including the gradient EM algorithm. Parameter estimates for each of these EM algorithms are given and the proposed models are applied to both real and simulated data, where they perform well.

Assuntos

Modelos Estatísticos , Análise de Sobrevida , Algoritmos , Biometria , Queimaduras/complicações , Queimaduras/terapia , Bases de Dados Factuais , Desinfecção , Humanos , Estimativa de Kaplan-Meier , Modelos de Riscos Proporcionais , Infecções Estafilocócicas/etiologia

19.

Reliability of transcranial magnetic stimulation measures of afferent inhibition.

Turco, Claudia V; Pesevski, Angelina; McNicholas, Paul D; Beaulieu, Louis-David; Nelson, Aimee J.

Brain Res ; 1723: 146394, 2019 11 15.

Artigo em Inglês | MEDLINE | ID: mdl-31425680

RESUMO

Short-latency afferent inhibition (SAI) and long-latency afferent inhibition (LAI) are well-known transcranial magnetic stimulation (TMS) paradigms used to probe the sensorimotor system. To date, there is a paucity of research examining the reliability of these neurophysiological measures. This information is required to validate the utility of afferent inhibition as a biomarker of neural function. The goal of this study was to quantify the absolute reliability, relative reliability, and smallest detectable change (SDC) of SAI and LAI using a test-retest paradigm. 30 healthy individuals (20.9â¯±â¯2.5â¯years) participated in two sessions (intersession interval of ~7â¯days). Reliability was assessed with intraclass correlation coefficients (ICC), standard error of measurement (SEMeas), and SDC. The results show that LAI and SAI had poor-to-moderate relative reliability as determined by the ICC, with digital nerve LAI displaying the highest relative reliability (highest ICC with smallest confidence interval). The %SEMeas indicated a large amount of measurement error in all measures of afferent inhibition, with LAI exhibiting more measurement error than SAI. The SDC was large at the individual level (SDCindiv), but analyses showed that the SDC is significantly reduced at the group-level (SDCgroup). Our results indicate that digital nerve LAI is the most reliable outcome to differentiate between individuals within a sample. Further, results suggest that SAI and LAI are not appropriate indicators of individual neurophysiological change across time but can reliably detect changes in group-averaged data providing sample sizes are sufficient.

Assuntos

Vias Aferentes/fisiologia , Inibição Neural/fisiologia , Estimulação Magnética Transcraniana/métodos , Estimulação Elétrica , Potencial Evocado Motor/fisiologia , Feminino , Humanos , Masculino , Córtex Motor/fisiologia , Tempo de Reação/fisiologia , Reprodutibilidade dos Testes , Adulto Jovem

20.

Predicting hospital and emergency department utilization among community-dwelling older adults: Statistical and machine learning approaches.

Jones, Aaron; Costa, Andrew P; Pesevski, Angelina; McNicholas, Paul D.

PLoS One ; 13(11): e0206662, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30383850

RESUMO

OBJECTIVE: The objective of this study was to compare the performance of several commonly used machine learning methods to traditional statistical methods for predicting emergency department and hospital utilization among patients receiving publicly-funded home care services. STUDY DESIGN AND SETTING: We conducted a population-based retrospective cohort study of publicly-funded home care recipients in the Hamilton-Niagara-Haldimand-Brant region of southern Ontario, Canada between 2014 and 2016. Gradient boosted trees, neural networks, and random forests were tested against two variations of logistic regression for predicting three outcomes related to emergency department and hospital utilization within six months of a comprehensive home care clinical assessment. Models were trained on data from years 2014 and 2015 and tested on data from 2016. Performance was compared using logarithmic score, Brier score, AUC, and diagnostic accuracy measures. RESULTS: Gradient boosted trees achieved the best performance on all three outcomes. Gradient boosted trees provided small but statistically significant performance gains over both traditional methods on all three outcomes, while neural networks significantly outperformed logistic regression on two of three outcomes. However, sensitivity and specificity gains from using gradient boosted trees over logistic regression were only in the range of 1%-2% at several classification thresholds. CONCLUSION: Gradient boosted trees and simple neural networks yielded small performance benefits over logistic regression for predicting emergency department and hospital utilization among patients receiving publicly-funded home care. However, the performance benefits were of negligible clinical importance.

Assuntos

Serviço Hospitalar de Emergência , Previsões/métodos , Aceitação pelo Paciente de Cuidados de Saúde , Acidentes por Quedas , Idoso de 80 Anos ou mais , Feminino , Serviços de Assistência Domiciliar , Hospitalização , Humanos , Vida Independente , Modelos Logísticos , Aprendizado de Máquina , Masculino , Ontário , Estudos Retrospectivos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa