Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Bioinformatics ; 39(5)2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-37018147

RESUMEN

MOTIVATION: Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means of discovering gene co-expression networks. RESULTS: In this work, a mixture of matrix variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. We propose three different frameworks for parameter estimation: a Markov chain Monte Carlo-based approach, a variational Gaussian approximation-based approach, and a hybrid approach. Various information criteria are used for model selection. The models are applied to both real and simulated data, and we demonstrate that the proposed approaches can recover the underlying cluster structure in both cases. In simulation studies where the true model parameters are known, our proposed approach shows good parameter recovery. AVAILABILITY AND IMPLEMENTATION: The GitHub R package for this work is available at https://github.com/anjalisilva/mixMVPLN and is released under the open source MIT license.


Asunto(s)
Transcriptoma , Distribución Normal , Simulación por Computador , Distribuciones Estadísticas , Análisis de Secuencia de ARN
2.
Biodivers Data J ; 11: e96480, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38327328

RESUMEN

Here, we introduce VLF, an R package to determine the distribution of very low frequency variants (VLFs) in nucleotide and amino acid sequences for the analysis of errors in DNA sequence records. The package allows users to assess VLFs in aligned and trimmed protein-coding sequences by automatically calculating the frequency of nucleotides or amino acids in each sequence position and outputting those that occur under a user-specified frequency (default of p = 0.001). These results can then be used to explore fundamental population genetic and phylogeographic patterns, mechanisms and processes at the microevolutionary level, such as nucleotide and amino acid sequence conservation. Our package extends earlier work pertaining to an implementation of VLF analysis in Microsoft Excel, which was found to be both computationally slow and error prone. We compare those results to our own herein. Results between the two implementations are found to be highly consistent for a large DNA barcode dataset of bird species. Differences in results are readily explained by both manual human error and inadequate Linnean taxonomy (specifically, species synonymy). Here, VLF is also applied to a subset of avian barcodes to assess the extent of biological artifacts at the species level for Canada goose (Branta canadensis), as well as within a large dataset of DNA barcodes for fishes of forensic and regulatory importance. The novelty of VLF and its benefit over the previous implementation include its high level of automation, speed, scalability and ease-of-use, each desirable characteristics which will be extremely valuable as more sequence data are rapidly accumulated in popular reference databases, such as BOLD and GenBank.

3.
J Autism Dev Disord ; 52(1): 392-401, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33704613

RESUMEN

This study examined the trajectories of autistic symptom severity in an inception cohort of 187 children with ASD assessed across four time points from diagnosis to age 10. Trajectory groups were derived using multivariate cluster analysis. A two trajectory/cluster solution was selected. Change in trajectory slopes revealed a turning point marked by plateauing in symptom reduction during the period of transition to school (age 6) for one of the two trajectories. Trajectories were labelled: Continuously Improving (27%) and Improving then Plateauing (73% of sample). Children in the two trajectories differed in levels of symptom severity, language, cognitive, and adaptive functioning skills. Study findings can inform the development of more personalized services for children with ASD transitioning into the school system.


Asunto(s)
Trastorno del Espectro Autista , Trastorno Autístico , Trastorno del Espectro Autista/diagnóstico , Trastorno Autístico/diagnóstico , Niño , Humanos , Lenguaje , Análisis Multivariante , Instituciones Académicas
4.
J Classif ; 38(3): 423-424, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34924653
5.
Neurooncol Adv ; 3(1): vdab144, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34765972

RESUMEN

BACKGROUND: Glioblastoma (GBM), the most common and aggressive primary brain tumour in adults, has been classified into three subtypes: classical, mesenchymal, and proneural. While the original classification relied on an 840 gene-set, further clarification on true GBM subtypes uses a 150-gene signature to accurately classify GBM into the three subtypes. We hypothesized whether a machine learning approach could be used to identify a smaller gene-set to accurately predict GBM subtype. METHODS: Using a supervised machine learning approach, extreme gradient boosting (XGBoost), we developed a classifier to predict the three subtypes of glioblastoma (GBM): classical, mesenchymal, and proneural. We tested the classifier on in-house GBM tissue, cell lines, and xenograft samples to predict their subtype. RESULTS: We identified the five most important genes for characterizing the three subtypes based on genes that often exhibited high Importance Scores in our XGBoost analyses. On average, this approach achieved 80.12% accuracy in predicting these three subtypes of GBM. Furthermore, we applied our five-gene classifier to successfully predict the subtype of GBM samples at our centre. CONCLUSION: Our 5-gene set classifier is the smallest classifier to date that can predict GBM subtypes with high accuracy, which could facilitate the future development of a five-gene subtype diagnostic biomarker for routine assays in GBM samples.

6.
J Am Geriatr Soc ; 69(1): 164-172, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-32936468

RESUMEN

BACKGROUND/OBJECTIVES: Sarcopenia is associated with poor health outcomes such as disability, institutionalization, and mortality. Efforts to manage sarcopenia clinically have been hindered by challenges in determining how to ascertain sarcopenia status correctly. The objective of this project was to assess the agreement between the different methods of ascertaining sarcopenia recommended by expert groups. DESIGN: Cross-sectional study of baseline data (2011-2015) from the Canadian Longitudinal Study on Aging. SETTING: Population-based multicenter study of community-dwelling participants. PARTICIPANTS: Eligible participants (n = 12,646) aged 65 to 85 living within 25 to 50 km of 11 data collection sites in Canada. The analyses included 10,820 participants with the data required to diagnose sarcopenia. MEASUREMENTS: Sarcopenia was operationalized as appendicular lean mass (ALM), ALM and grip strength, ALM and gait speed, and grip strength and gait speed. Within each combination, ALM was adjusted for height squared, weight, body mass index, and the residual of regressing lean mass on height and fat mass. The lowest 20th sex-specific percentile values were used as the cutoffs for low ALM. Low grip strength cutoffs of 35.5 kg for men and 20 kg for women and a gait speed cutoff of .8 m/s were used. RESULTS: The mean age was 73.0 ± 5.6 years, and 51.9% of the sample was male. The agreement (Cohen's κ) between the different combinations of variables used to ascertain sarcopenia status was below .50. Agreement for the different lean mass adjustment techniques ranged from .04 to .76. CONCLUSION: The combination of variables used to ascertain sarcopenia and many of the ALM adjustment techniques have insufficient agreement to be considered equivalent. This has important clinical implications for the management of sarcopenia because treatments may differ based on how sarcopenia is identified. To improve the clinical utility of sarcopenia, a unified definition of sarcopenia is required.


Asunto(s)
Envejecimiento , Pacientes/estadística & datos numéricos , Sarcopenia/diagnóstico , Absorciometría de Fotón , Factores de Edad , Anciano , Anciano de 80 o más Años , Índice de Masa Corporal , Canadá , Estudios Transversales , Femenino , Fuerza de la Mano/fisiología , Humanos , Vida Independiente , Estudios Longitudinales , Masculino , Debilidad Muscular/fisiopatología , Velocidad al Caminar/fisiología
7.
J Cachexia Sarcopenia Muscle ; 11(6): 1603-1613, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-32940016

RESUMEN

BACKGROUND: Sarcopenia definitions recommend different combinations of variables (lean mass, strength, and physical function) and different methods of adjusting lean mass. The purpose of this paper was to address the gaps in the literature regarding how differences in the operationalization of sarcopenia impact the association between sarcopenia and injurious falls. METHODS: Participants included 9936 individuals from the Canadian Longitudinal Study on Aging aged ≥65 years at baseline (2012-2015), with complete data for sarcopenia-related variables, injurious falls, and covariates. Sarcopenia was defined using all combinations of muscle variables (lean mass, grip strength, chair rise test, and gait speed) and methods of adjusting lean mass (height2 , weight, body mass index (BMI), and regressing on height and fat mass) recommended by the expert group sarcopenia definitions. Multiple cut off values for the measures were explored. The association between sarcopenia and injurious falls (0, 1, or 2+ falls) measured 18 months after baseline data collection were assessed using proportional odds regression models. RESULTS: In men (n = 5162, 72.9 ± 5.6 years), the odds of having a higher level of injurious falls was between 1.43 and 2.14 greater when sarcopenia was defined as (i) lean mass adjusted for weight only; (ii) grip strength (<30 or <26 kg) only; (iii) lean mass adjusted for weight and grip strength (<30 or <26 kg); (iv) lean mass adjusted for BMI and grip strength (<26 kg); and (v) lean mass adjusted using the regression technique and grip strength (<30 or <26 kg). In women (n = 4774, 72.8 ± 5.6 years), only the combination of lean mass adjusted using regression with gait speed (<0.8 m/s) was associated with a significantly higher odds (1.46, 95% confidence interval: 1.01-2.10, P = 0.04) of having a higher level of injurious falls. CONCLUSIONS: Sarcopenia definitions based on different combinations of muscle variables and methods of adjusting lean mass are not equally associated with injurious falls. In men, definitions including grip strength but not gait speed or the chair rise test, and adjusting lean mass for weight, BMI, or using the residual technique but not height2 , tended to be associated with injurious falls. In women, sarcopenia was generally not associated with injurious falls regardless of the definition used.


Asunto(s)
Sarcopenia , Accidentes por Caídas , Anciano , Envejecimiento , Canadá , Femenino , Fuerza de la Mano , Humanos , Estudios Longitudinales , Masculino , Sarcopenia/diagnóstico , Sarcopenia/epidemiología
8.
IEEE Trans Pattern Anal Mach Intell ; 42(3): 610-621, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-30530313

RESUMEN

The mixture of factor analyzers (MFA) model is a famous mixture model-based approach for unsupervised learning with high-dimensional data. It can be useful, inter alia, in situations where the data dimensionality far exceeds the number of observations. In recent years, the MFA model has been extended to non-Gaussian mixtures to account for clusters with heavier tail weight and/or asymmetry. The generalized hyperbolic factor analyzers (MGHFA) model is one such extension, which leads to a flexible modelling paradigm that accounts for both heavier tail weight and cluster asymmetry. In many practical applications, the occurrence of missing values often complicates data analyses. A generalization of the MGHFA is presented to accommodate missing values. Under a missing-at-random mechanism, we develop a computationally efficient alternating expectation conditional maximization algorithm for parameter estimation of the MGHFA model with different patterns of missing values. The imputation of missing values under an incomplete-data structure of MGHFA is also investigated. The performance of our proposed methodology is illustrated through the analysis of simulated and real data.

9.
Brain Res ; 1723: 146394, 2019 11 15.
Artículo en Inglés | MEDLINE | ID: mdl-31425680

RESUMEN

Short-latency afferent inhibition (SAI) and long-latency afferent inhibition (LAI) are well-known transcranial magnetic stimulation (TMS) paradigms used to probe the sensorimotor system. To date, there is a paucity of research examining the reliability of these neurophysiological measures. This information is required to validate the utility of afferent inhibition as a biomarker of neural function. The goal of this study was to quantify the absolute reliability, relative reliability, and smallest detectable change (SDC) of SAI and LAI using a test-retest paradigm. 30 healthy individuals (20.9 ±â€¯2.5 years) participated in two sessions (intersession interval of ~7 days). Reliability was assessed with intraclass correlation coefficients (ICC), standard error of measurement (SEMeas), and SDC. The results show that LAI and SAI had poor-to-moderate relative reliability as determined by the ICC, with digital nerve LAI displaying the highest relative reliability (highest ICC with smallest confidence interval). The %SEMeas indicated a large amount of measurement error in all measures of afferent inhibition, with LAI exhibiting more measurement error than SAI. The SDC was large at the individual level (SDCindiv), but analyses showed that the SDC is significantly reduced at the group-level (SDCgroup). Our results indicate that digital nerve LAI is the most reliable outcome to differentiate between individuals within a sample. Further, results suggest that SAI and LAI are not appropriate indicators of individual neurophysiological change across time but can reliably detect changes in group-averaged data providing sample sizes are sufficient.


Asunto(s)
Vías Aferentes/fisiología , Inhibición Neural/fisiología , Estimulación Magnética Transcraneal/métodos , Estimulación Eléctrica , Potenciales Evocados Motores/fisiología , Femenino , Humanos , Masculino , Corteza Motora/fisiología , Tiempo de Reacción/fisiología , Reproducibilidad de los Resultados , Adulto Joven
10.
BMC Bioinformatics ; 20(1): 394, 2019 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-31311497

RESUMEN

BACKGROUND: High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mixture model-based clustering in this context is to discover groups of co-expressed genes, which can shed light on biological functions and pathways of gene products. RESULTS: A mixture of multivariate Poisson-log normal (MPLN) model is developed for clustering of high-throughput transcriptome sequencing data. Parameter estimation is carried out using a Markov chain Monte Carlo expectation-maximization (MCMC-EM) algorithm, and information criteria are used for model selection. CONCLUSIONS: The mixture of MPLN model is able to fit a wide range of correlation and overdispersion situations, and is suited for modeling multivariate count data from RNA sequencing studies. All scripts used for implementing the method can be found at https://github.com/anjalisilva/MPLNClust .


Asunto(s)
Algoritmos , ARN/química , Análisis por Conglomerados , Secuenciación de Nucleótidos de Alto Rendimiento , Cadenas de Markov , Modelos Teóricos , Método de Montecarlo , ARN/genética , ARN/metabolismo , Análisis de Secuencia de ARN , Interfaz Usuario-Computador
11.
PLoS One ; 13(11): e0206662, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30383850

RESUMEN

OBJECTIVE: The objective of this study was to compare the performance of several commonly used machine learning methods to traditional statistical methods for predicting emergency department and hospital utilization among patients receiving publicly-funded home care services. STUDY DESIGN AND SETTING: We conducted a population-based retrospective cohort study of publicly-funded home care recipients in the Hamilton-Niagara-Haldimand-Brant region of southern Ontario, Canada between 2014 and 2016. Gradient boosted trees, neural networks, and random forests were tested against two variations of logistic regression for predicting three outcomes related to emergency department and hospital utilization within six months of a comprehensive home care clinical assessment. Models were trained on data from years 2014 and 2015 and tested on data from 2016. Performance was compared using logarithmic score, Brier score, AUC, and diagnostic accuracy measures. RESULTS: Gradient boosted trees achieved the best performance on all three outcomes. Gradient boosted trees provided small but statistically significant performance gains over both traditional methods on all three outcomes, while neural networks significantly outperformed logistic regression on two of three outcomes. However, sensitivity and specificity gains from using gradient boosted trees over logistic regression were only in the range of 1%-2% at several classification thresholds. CONCLUSION: Gradient boosted trees and simple neural networks yielded small performance benefits over logistic regression for predicting emergency department and hospital utilization among patients receiving publicly-funded home care. However, the performance benefits were of negligible clinical importance.


Asunto(s)
Servicio de Urgencia en Hospital , Predicción/métodos , Aceptación de la Atención de Salud , Accidentes por Caídas , Anciano de 80 o más Años , Femenino , Servicios de Atención de Salud a Domicilio , Hospitalización , Humanos , Vida Independiente , Modelos Logísticos , Aprendizaje Automático , Masculino , Ontario , Estudios Retrospectivos
12.
Front Physiol ; 9: 1373, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30356739

RESUMEN

The factors that underpin heterogeneity in muscle hypertrophy following resistance exercise training (RET) remain largely unknown. We examined circulating hormones, intramuscular hormones, and intramuscular hormone-related variables in resistance-trained men before and after 12 weeks of RET. Backward elimination and principal component regression evaluated the statistical significance of proposed circulating anabolic hormones (e.g., testosterone, free testosterone, dehydroepiandrosterone, dihydrotestosterone, insulin-like growth factor-1, free insulin-like growth factor-1, luteinizing hormone, and growth hormone) and RET-induced changes in muscle mass (n = 49). Immunoblots and immunoassays were used to evaluate intramuscular free testosterone levels, dihydrotestosterone levels, 5α-reductase expression, and androgen receptor content in the highest- (HIR; n = 10) and lowest- (LOR; n = 10) responders to the 12 weeks of RET. No hormone measured before exercise, after exercise, pre-intervention, or post-intervention was consistently significant or consistently selected in the final model for the change in: type 1 cross sectional area (CSA), type 2 CSA, or fat- and bone-free mass (LBM). Principal component analysis did not result in large dimension reduction and principal component regression was no more effective than unadjusted regression analyses. No hormone measured in the blood or muscle was different between HIR and LOR. The steroidogenic enzyme 5α-reductase increased following RET in the HIR (P < 0.01) but not the LOR (P = 0.32). Androgen receptor content was unchanged with RET but was higher at all times in HIR. Unlike intramuscular free testosterone, dihydrotestosterone, or 5α-reductase, there was a linear relationship between androgen receptor content and change in LBM (P < 0.01), type 1 CSA (P < 0.05), and type 2 CSA (P < 0.01) both pre- and post-intervention. These results indicate that intramuscular androgen receptor content, but neither circulating nor intramuscular hormones (or the enzymes regulating their intramuscular production), influence skeletal muscle hypertrophy following RET in previously trained young men.

13.
J Cheminform ; 9(1): 46, 2017 Aug 16.
Artículo en Inglés | MEDLINE | ID: mdl-29086195

RESUMEN

Natural products represent a prominent source of pharmaceutically and industrially important agents. Calculating the chemical similarity of two molecules is a central task in cheminformatics, with applications at multiple stages of the drug discovery pipeline. Quantifying the similarity of natural products is a particularly important problem, as the biological activities of these molecules have been extensively optimized by natural selection. The large and structurally complex scaffolds of natural products distinguish their physical and chemical properties from those of synthetic compounds. However, no analysis of the performance of existing methods for molecular similarity calculation specific to natural products has been reported to date. Here, we present LEMONS, an algorithm for the enumeration of hypothetical modular natural product structures. We leverage this algorithm to conduct a comparative analysis of molecular similarity methods within the unique chemical space occupied by modular natural products using controlled synthetic data, and comprehensively investigate the impact of diverse biosynthetic parameters on similarity search. We additionally investigate a recently described algorithm for natural product retrobiosynthesis and alignment, and find that when rule-based retrobiosynthesis can be applied, this approach outperforms conventional two-dimensional fingerprints, suggesting it may represent a valuable approach for the targeted exploration of natural product chemical space and microbial genome mining. Our open-source algorithm is an extensible method of enumerating hypothetical natural product structures with diverse potential applications in bioinformatics.

14.
BMC Bioinformatics ; 18(1): 150, 2017 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-28257645

RESUMEN

BACKGROUND: A family of parsimonious Gaussian mixture models for the biclustering of gene expression data is introduced. Biclustering is accommodated by adopting a mixture of factor analyzers model with a binary, row-stochastic factor loadings matrix. This particular form of factor loadings matrix results in a block-diagonal covariance matrix, which is a useful property in gene expression analyses, specifically in biomarker discovery scenarios where blood can potentially act as a surrogate tissue for other less accessible tissues. Prior knowledge of the factor loadings matrix is useful in this application and is reflected in the one-way supervised nature of the algorithm. Additionally, the factor loadings matrix can be assumed to be constant across all components because of the relationship desired between the various types of tissue samples. Parameter estimates are obtained through a variant of the expectation-maximization algorithm and the best-fitting model is selected using the Bayesian information criterion. The family of models is demonstrated using simulated data and two real microarray data sets. The first real data set is from a rat study that investigated the influence of diabetes on gene expression in different tissues. The second real data set is from a human transcriptomics study that focused on blood and immune tissues. The microarray data sets illustrate the biclustering family's performance in biomarker discovery involving peripheral blood as surrogate biopsy material. RESULTS: The simulation studies indicate that the algorithm identifies the correct biclusters, most optimally when the number of observation clusters is known. Moreover, the biclustering algorithm identified biclusters comprised of biologically meaningful data related to insulin resistance and immune function in the rat and human real data sets, respectively. CONCLUSIONS: Initial results using real data show that this biclustering technique provides a novel approach for biomarker discovery by enabling blood to be used as a surrogate for hard-to-obtain tissues.


Asunto(s)
Bases de Datos Genéticas , Expresión Génica , Aprendizaje Automático Supervisado , Transcriptoma , Animales , Teorema de Bayes , Biomarcadores/sangre , Análisis por Conglomerados , Diabetes Mellitus Experimental/genética , Humanos , Masculino , Modelos Teóricos , Ratas , Ratas Zucker
15.
Biom J ; 58(6): 1506-1537, 2016 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-27510372

RESUMEN

A mixture of multivariate contaminated normal distributions is developed for model-based clustering. In addition to the parameters of the classical normal mixture, our contaminated mixture has, for each cluster, a parameter controlling the proportion of mild outliers and one specifying the degree of contamination. Crucially, these parameters do not have to be specified a priori, adding a flexibility to our approach. Parsimony is introduced via eigen-decomposition of the component covariance matrices, and sufficient conditions for the identifiability of all the members of the resulting family are provided. An expectation-conditional maximization algorithm is outlined for parameter estimation and various implementation issues are discussed. Using a large-scale simulation study, the behavior of the proposed approach is investigated and comparison with well-established finite mixtures is provided. The performance of this novel family of models is also illustrated on artificial and real data.


Asunto(s)
Algoritmos , Modelos Estadísticos , Análisis por Conglomerados , Distribución Normal
16.
Biometrics ; 71(4): 1081-9, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26134429

RESUMEN

An expanded family of mixtures of multivariate power exponential distributions is introduced. While fitting heavy-tails and skewness have received much attention in the model-based clustering literature recently, we investigate the use of a distribution that can deal with both varying tail-weight and peakedness of data. A family of parsimonious models is proposed using an eigen-decomposition of the scale matrix. A generalized expectation-maximization algorithm is presented that combines convex optimization via a minorization-maximization approach and optimization based on accelerated line search algorithms on the Stiefel manifold. Lastly, the utility of this family of models is illustrated using both toy and benchmark data.


Asunto(s)
Biometría/métodos , Modelos Estadísticos , Análisis Multivariante , Algoritmos , Animales , Teorema de Bayes , Análisis por Conglomerados , Simulación por Computador , Bases de Datos Factuales/estadística & datos numéricos , Femenino , Humanos , Funciones de Verosimilitud , Masculino , Distribución Normal , Distribuciones Estadísticas
17.
BMC Genomics ; 15: 1056, 2014 Dec 03.
Artículo en Inglés | MEDLINE | ID: mdl-25471115

RESUMEN

BACKGROUND: Understanding gene expression and metabolic re-programming that occur in response to limiting nitrogen (N) conditions in crop plants is crucial for the ongoing progress towards the development of varieties with improved nitrogen use efficiency (NUE). To unravel new details on the molecular and metabolic responses to N availability in a major food crop, we conducted analyses on a weighted gene co-expression network and metabolic profile data obtained from leaves and roots of rice plants adapted to sufficient and limiting N as well as after shifting them to limiting (reduction) and sufficient (induction) N conditions. RESULTS: A gene co-expression network representing clusters of rice genes with similar expression patterns across four nitrogen conditions and two tissue types was generated. The resulting 18 clusters were analyzed for enrichment of significant gene ontology (GO) terms. Four clusters exhibited significant correlation with limiting and reducing nitrate treatments. Among the identified enriched GO terms, those related to nucleoside/nucleotide, purine and ATP binding, defense response, sugar/carbohydrate binding, protein kinase activities, cell-death and cell wall enzymatic activity are enriched. Although a subset of functional categories are more broadly associated with the response of rice organs to limiting N and N reduction, our analyses suggest that N reduction elicits a response distinguishable from that to adaptation to limiting N, particularly in leaves. This observation is further supported by metabolic profiling which shows that several compounds in leaves change proportionally to the nitrate level (i.e. higher in sufficient N vs. limiting N) and respond with even higher levels when the nitrate level is reduced. Notably, these compounds are directly involved in N assimilation, transport, and storage (glutamine, asparagine, glutamate and allantoin) and extend to most amino acids. Based on these data, we hypothesize that plants respond by rapidly mobilizing stored vacuolar nitrate when N deficit is perceived, and that the response likely involves phosphorylation signal cascades and transcriptional regulation. CONCLUSIONS: The co-expression network analysis and metabolic profiling performed in rice pinpoint the relevance of signal transduction components and regulation of N mobilization in response to limiting N conditions and deepen our understanding of N responses and N use in crops.


Asunto(s)
Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes , Redes y Vías Metabólicas , Nitratos/metabolismo , Oryza/genética , Oryza/metabolismo , Análisis por Conglomerados , Biología Computacional , Epigénesis Genética , Perfilación de la Expresión Génica , Metaboloma , Metabolómica , Anotación de Secuencia Molecular , Familia de Multigenes , Especificidad de Órganos , Hojas de la Planta/genética , Hojas de la Planta/metabolismo , Raíces de Plantas/genética , Raíces de Plantas/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/metabolismo
18.
BMC Genomics ; 15: 681, 2014 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-25128291

RESUMEN

BACKGROUND: High density stress, also known as intraspecies competition, causes significant yield losses in a wide variety of crop plants. At the same time, increases in density tolerance through selective breeding and the concomitant ability to plant crops at a higher population density has been one of the most important factors in the development of high yielding modern cultivars. RESULTS: Physiological changes underlying high density stress were examined in Oryza sativa plants over the course of a life cycle by assessing differences in gene expression and metabolism. Moreover, the nitrogen limitation was examined in parallel with high density stress to gain a better understanding of physiological responses specific to high density stress. While both nitrogen limitation and high density resulted in decreased shoot fresh weight, tiller number, plant height and chlorophyll content, high density stress alone had a greater impact on physiological factors. Decreases in aspartate and glutamate concentration were found in plants grown under both stress conditions; however, high density stress had a more significant effect on the concentration of these amino acids. Global transcriptome analysis revealed a large proportion of genes with altered expression in response to both stresses. The presence of ethylene-associated genes in a majority of density responsive genes was investigated further. Expression of ethylene biosynthesis genes ACC synthase 1, ACC synthase 2 and ACC oxidase 7 were found to be upregulated in plants under high density stress. Plants at high density were also found to up regulate ethylene-associated genes and senescence genes, while cytokinin response and biosynthesis genes were down regulated, consistent with higher ethylene production. CONCLUSIONS: High density stress has similar but greater impact on plant growth and development compared to nitrogen limitation. Global transcriptome changes implicate ethylene as a volatile signal used to communicate proximity in under dense population growth condition and suggest a role for phytohormones in high density stress response in rice plants.


Asunto(s)
Etilenos/metabolismo , Perfilación de la Expresión Génica , Metabolómica , Nitrógeno/metabolismo , Oryza/genética , Oryza/metabolismo , Estrés Fisiológico , Ácido Aspártico/metabolismo , Genes de Plantas/genética , Ácido Glutámico/metabolismo , Oryza/crecimiento & desarrollo , Oryza/fisiología
19.
IEEE Trans Pattern Anal Mach Intell ; 36(6): 1149-57, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-26353277

RESUMEN

A mixture of shifted asymmetric Laplace distributions is introduced and used for clustering and classification. A variant of the EM algorithm is developed for parameter estimation by exploiting the relationship with the generalized inverse Gaussian distribution. This approach is mathematically elegant and relatively computationally straightforward. Our novel mixture modelling approach is demonstrated on both simulated and real data to illustrate clustering and classification applications. In these analyses, our mixture of shifted asymmetric Laplace distributions performs favourably when compared to the popular Gaussian approach. This work, which marks an important step in the non-Gaussian model-based clustering and classification direction, concludes with discussion as well as suggestions for future work.

20.
BMC Plant Biol ; 13: 42, 2013 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-23497159

RESUMEN

BACKGROUND: The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. RESULTS: A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. CONCLUSIONS: An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.


Asunto(s)
Antocianinas/biosíntesis , Vías Biosintéticas , Biología Computacional/métodos , Flavonoides/biosíntesis , Proteínas de Plantas/genética , Regiones Promotoras Genéticas , Programas Informáticos , Zea mays/genética , Algoritmos , Arabidopsis/genética , Arabidopsis/crecimiento & desarrollo , Arabidopsis/metabolismo , Secuencia de Bases , Biología Computacional/instrumentación , Datos de Secuencia Molecular , Proteínas de Plantas/metabolismo , Zea mays/crecimiento & desarrollo , Zea mays/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...