Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Proc Natl Acad Sci U S A ; 121(8): e2307430121, 2024 Feb 20.
Artículo en Inglés | MEDLINE | ID: mdl-38359289

RESUMEN

Blood metabolite levels are affected by numerous factors, including preanalytical factors such as collection methods and geographical sites. These perturbations have caused deleterious consequences for many metabolomics studies and represent a major challenge in the metabolomics field. It is important to understand these factors and develop models to reduce their perturbations. However, to date, the lack of suitable mathematical models for blood metabolite levels under homeostasis has hindered progress. In this study, we develop quantitative models of blood metabolite levels in healthy adults based on multisite sample cohorts that mimic the current challenge. Five cohorts of samples obtained across four geographically distinct sites were investigated, focusing on approximately 50 metabolites that were quantified using 1H NMR spectroscopy. More than one-third of the variation in these metabolite profiles is due to cross-cohort variation. A dramatic reduction in the variation of metabolite levels (90%), especially their site-to-site variation (95%), was achieved by modeling each metabolite using demographic and clinical factors and especially other metabolites, as observed in the top principal components. The results also reveal that several metabolites contribute disproportionately to such variation, which could be explained by their association with biological pathways including biosynthesis and degradation. The study demonstrates an intriguing network effect of metabolites that can be utilized to better define homeostatic metabolite levels, which may have implications for improved health monitoring. As an example of the potential utility of the approach, we show that modeling gender-related metabolic differences retains the interesting variance while reducing unwanted (site-related) variance.


Asunto(s)
Metaboloma , Metabolómica , Adulto , Humanos , Metabolómica/métodos , Espectroscopía de Resonancia Magnética , Homeostasis
2.
Theor Appl Genet ; 134(8): 2613-2637, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34018019

RESUMEN

KEY MESSAGE: Association analysis for ionomic concentrations of 20 elements identified independent genetic factors underlying the root and shoot ionomes of rice, providing a platform for selecting and dissecting causal genetic variants. Understanding the genetic basis of mineral nutrient acquisition is key to fully describing how terrestrial organisms interact with the non-living environment. Rice (Oryza sativa L.) serves both as a model organism for genetic studies and as an important component of the global food system. Studies in rice ionomics have primarily focused on above ground tissues evaluated from field-grown plants. Here, we describe a comprehensive study of the genetic basis of the rice ionome in both roots and shoots of 6-week-old rice plants for 20 elements using a controlled hydroponics growth system. Building on the wealth of publicly available rice genomic resources, including a panel of 373 diverse rice lines, 4.8 M genome-wide single-nucleotide polymorphisms, single- and multi-marker analysis pipelines, an extensive tome of 321 candidate genes and legacy QTLs from across 15 years of rice genetics literature, we used genome-wide association analysis and biparental QTL analysis to identify 114 genomic regions associated with ionomic variation. The genetic basis for root and shoot ionomes was highly distinct; 78 loci were associated with roots and 36 loci with shoots, with no overlapping genomic regions for the same element across tissues. We further describe the distribution of phenotypic variation across haplotypes and identify candidate genes within highly significant regions associated with sulfur, manganese, cadmium, and molybdenum. Our analysis provides critical insight into the genetic basis of natural phenotypic variation for both root and shoot ionomes in rice and provides a comprehensive resource for dissecting and testing causal genetic variants.


Asunto(s)
Mapeo Cromosómico/métodos , Cromosomas de las Plantas/genética , Regulación de la Expresión Génica de las Plantas , Oryza/genética , Proteínas de Plantas/metabolismo , Raíces de Plantas/genética , Brotes de la Planta/genética , Estudio de Asociación del Genoma Completo , Oryza/crecimiento & desarrollo , Fenotipo , Proteínas de Plantas/genética , Raíces de Plantas/crecimiento & desarrollo , Brotes de la Planta/crecimiento & desarrollo , Sitios de Carácter Cuantitativo
3.
J Integr Plant Biol ; 62(10): 1469-1484, 2020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-32246811

RESUMEN

The extensive phenotypic diversity within natural populations of Arabidopsis is associated with differences in gene expression. Transcript levels can be considered as inheritable quantitative traits, and used to map expression quantitative trait loci (eQTL) in genome-wide association studies (GWASs). In order to identify putative genetic determinants for variations in gene expression, we used publicly available genomic and transcript variation data from 665 Arabidopsis accessions and applied the single nucleotide polymorphism-set (Sequence) Kernel Association Test (SKAT) method for the identification of eQTL. Moreover, we used the penalized orthogonal-components regression (POCRE) method to increase the power of statistical tests. Then, gene annotations were used as test units to identify genes that are associated with natural variations in transcript accumulation, which correspond to candidate regulators, some of which may have a broad impact on gene expression. Besides increasing the chances to identify real associations, the analysis using POCRE and SKAT significantly reduced the computational cost required to analyze large datasets. As a proof of concept, we used this approach to identify eQTL that represent novel candidate regulators of immune responses. The versatility of this approach allows its application to any process that is subjected to natural variation among Arabidopsis accessions.


Asunto(s)
Arabidopsis/genética , Sitios de Carácter Cuantitativo/genética , Arabidopsis/inmunología , Arabidopsis/fisiología , Proteínas de Arabidopsis/genética , Cloroplastos/genética , Estudio de Asociación del Genoma Completo , Mitocondrias/genética
4.
Metabolomics ; 13(11)2017 11.
Artículo en Inglés | MEDLINE | ID: mdl-30814918

RESUMEN

Introduction: Metabolomics technologies enable the identification of putative biomarkers for numerous diseases; however, the influence of confounding factors on metabolite levels poses a major challenge in moving forward with such metabolites for pre-clinical or clinical applications. Objectives: To address this challenge, we analyzed metabolomics data from a colorectal cancer (CRC) study, and used seemingly unrelated regression (SUR) to account for the effects of confounding factors including gender, BMI, age, alcohol use, and smoking. Methods: A SUR model based on 113 serum metabolites quantified using targeted mass spectrometry, identified 20 metabolites that differentiated CRC patients (n = 36), patients with polyp (n = 39), and healthy subjects (n = 83). Models built using different groups of biologically related metabolites achieved improved differentiation and were significant for 26 out of 29 groups. Furthermore, the networks of correlated metabolites constructed for all groups of metabolites using the ParCorA algorithm, before or after application of the SUR model, showed significant alterations for CRC and polyp patients relative to healthy controls. Results: The results showed that demographic covariates, such as gender, BMI, BMI2, and smoking status, exhibit significant confounding effects on metabolite levels, which can be modeled effectively. Conclusion: These results not only provide new insights into addressing the major issue of confounding effects in metabolomics analysis, but also shed light on issues related to establishing reliable biomarkers and the biological connections between them in a complex disease.

5.
J Proteome Res ; 14(6): 2492-9, 2015 Jun 05.
Artículo en Inglés | MEDLINE | ID: mdl-25919433

RESUMEN

Despite the fact that colorectal cancer (CRC) is one of the most prevalent and deadly cancers in the world, the development of improved and robust biomarkers to enable screening, surveillance, and therapy monitoring of CRC continues to be evasive. In particular, patients with colon polyps are at higher risk of developing colon cancer; however, noninvasive methods to identify these patients suffer from poor performance. In consideration of the challenges involved in identifying metabolite biomarkers in individuals with high risk for colon cancer, we have investigated NMR-based metabolite profiling in combination with numerous demographic parameters to investigate the ability of serum metabolites to differentiate polyp patients from healthy subjects. We also investigated the effect of disease risk on different groups of biologically related metabolites. A powerful statistical approach, seemingly unrelated regression (SUR), was used to model the correlated levels of metabolites in the same biological group. The metabolites were found to be significantly affected by demographic covariates such as gender, BMI, BMI(2), and smoking status. After accounting for the effects of the confounding factors, we then investigated potential of metabolites from serum to differentiate patients with polyps and age matched healthy controls. Our results showed that while only valine was slightly associated, individually, with polyp patients, a number of biologically related groups of metabolites were significantly associated with polyps. These results may explain some of the challenges and promise a novel avenue for future metabolite profiling methodologies.


Asunto(s)
Pólipos del Colon/metabolismo , Enfermedades del Recto/metabolismo , Estudios de Casos y Controles , Pólipos del Colon/patología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Enfermedades del Recto/patología
6.
Biochim Biophys Acta ; 1839(11): 1330-40, 2014 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-25281873

RESUMEN

Protein arginine methyltransferase 5 (PRMT5) symmetrically methylates arginine residues of histones and non-histone protein substrates and regulates a variety of cellular processes through epigenetic control of target gene expression or post-translational modification of signaling molecules. Recent evidence suggests that PRMT5 may function as an oncogene and its overexpression contributes to the development and progression of several human cancers. However, the mechanism underlying the regulation of PRMT5 expression in cancer cells remains largely unknown. In the present study, we have mapped the proximal promoter of PRMT5 to the -240bp region and identified nuclear transcription factor Y (NF-Y) as a critical transcription factor that binds to the two inverted CCAAT boxes and regulates PRMT5 expression in multiple cancer cell lines. Further, we present evidence that loss of PRMT5 is responsible for cell growth inhibition induced by knockdown of NF-YA, a subunit of NF-Y that forms a heterotrimeric complex with NF-YB and NF-YC for function. Significantly, we have found that activation of protein kinase C (PKC) by phorbol 12-myristate 13-acetate (PMA) in LNCaP prostate cancer cells down-regulates the expression of NF-YA and PRMT5 at the transcription level in a c-Fos-dependent manner. Given that down-regulation of several PKC isozymes is implicated in the development and progression of several human cancers, our findings suggest that the PKC-c-Fos-NF-Y signaling pathway may be responsible for PRMT5 overexpression in a subset of human cancer patients.


Asunto(s)
Factor de Unión a CCAAT/fisiología , Proliferación Celular/genética , Neoplasias de la Próstata/genética , Proteína Quinasa C/fisiología , Proteína-Arginina N-Metiltransferasas/genética , Proteínas Proto-Oncogénicas c-fos/fisiología , Activación Transcripcional , Línea Celular Tumoral , Regulación hacia Abajo , Regulación Enzimológica de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Masculino , Neoplasias de la Próstata/metabolismo , Neoplasias de la Próstata/patología , Proteína-Arginina N-Metiltransferasas/metabolismo , Transducción de Señal
7.
Sci Rep ; 13(1): 19371, 2023 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-37938594

RESUMEN

Gene regulation plays an important role in understanding the mechanisms of human biology and diseases. However, inferring causal relationships between all genes is challenging due to the large number of genes in the transcriptome. Here, we present SIGNET (Statistical Inference on Gene Regulatory Networks), a flexible software package that reveals networks of causal regulation between genes built upon large-scale transcriptomic and genotypic data at the population level. Like Mendelian randomization, SIGNET uses genotypic variants as natural instrumental variables to establish such causal relationships but constructs a transcriptome-wide gene regulatory network with high confidence. SIGNET makes such a computationally heavy task feasible by deploying a well-designed statistical algorithm over a parallel computing environment. It also provides a user-friendly interface allowing for parameter tuning, efficient parallel computing scheduling, interactive network visualization, and confirmatory results retrieval. The Open source SIGNET software is freely available ( https://www.zstats.org/signet/ ).


Asunto(s)
Redes Reguladoras de Genes , Transcriptoma , Humanos , Perfilación de la Expresión Génica , Algoritmos , Causalidad
8.
Res Sq ; 2023 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-37546848

RESUMEN

Gene regulation plays an important role in understanding the mechanisms of human biology and diseases. However, inferring causal relationships between all genes is challenging due to the large number of genes in the transcriptome. Here, we present SIGNET (Statistical Inference on Gene Regulatory Networks), a flexible software package that reveals networks of causal regulation between genes built upon large-scale transcriptomic and genotypic data at the population level. Like Mendelian randomization, SIGNET uses genotypic variants as natural instrumental variables to establish such causal relationships but constructs a transcriptome-wide gene regulatory network with high confidence. SIGNET makes such a computationally heavy task feasible by deploying a well-designed statistical algorithm over a parallel computing environment. It also provides a user-friendly interface allowing for parameter tuning, efficient parallel computing scheduling, interactive network visualization, and confirmatory results retrieval. The Open source SIGNET software is freely available (https://www.zstats.org/signet/).

9.
Mol Biol Evol ; 28(6): 1901-11, 2011 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-21239390

RESUMEN

Understanding genome and chromosome evolution is important for understanding genetic inheritance and evolution. Universal events comprising DNA replication, transcription, repair, mobile genetic element transposition, chromosome rearrangements, mitosis, and meiosis underlie inheritance and variation of living organisms. Although the genome of a species as a whole is important, chromosomes are the basic units subjected to genetic events that coin evolution to a large extent. Now many complete genome sequences are available, we can address evolution and variation of individual chromosomes across species. For example, "How are the repeat and nonrepeat proportions of genetic codes distributed among different chromosomes in a multichromosome species?" "Is there a general rule behind the intuitive observation that chromosome lengths tend to be similar in a species, and if so, can we generalize any findings in chromosome content and size across different taxonomic groups?" Here, we show that chromosomes within a species do not show dramatic fluctuation in their content of mobile genetic elements as the proliferation of these elements increases from unicellular eukaryotes to vertebrates. Furthermore, we demonstrate that, notwithstanding the remarkable plasticity, there is an upper limit to chromosome-size variation in diploid eukaryotes with linear chromosomes. Strikingly, variation in chromosome size for 886 chromosomes in 68 eukaryotic genomes (including 22 human autosomes) can be viably captured by a single model, which predicts that the vast majority of the chromosomes in a species are expected to have a base pair length between 0.4035 and 1.8626 times the average chromosome length. This conserved boundary of chromosome-size variation, which prevails across a wide taxonomic range with few exceptions, indicates that cellular, molecular, and evolutionary mechanisms, possibly together, confine the chromosome lengths around a species-specific average chromosome length.


Asunto(s)
Cromosomas/genética , Diploidia , Eucariontes/genética , Algoritmos , Animales , Simulación por Computador , Evolución Molecular , Genoma/genética , Humanos , Modelos Genéticos , Modelos Estadísticos , Secuencias Repetitivas de Ácidos Nucleicos/genética , Translocación Genética/genética
10.
J Bus Econ Stat ; 40(2): 852-867, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35756092

RESUMEN

We compute the value-at-risk of financial losses by fitting a generalized Pareto distribution to exceedances over a threshold. Following the common practice of setting the threshold as high sample quantiles, we show that, for both independent observations and time-series data, the asymptotic variance for the maximum likelihood estimation depends on the choice of threshold, unlike the existing study of using a divergent threshold. We also propose a random weighted bootstrap method for the interval estimation of VaR, with critical values computed by the empirical distribution of the absolute differences between the bootstrapped estimators and the maximum likelihood estimator. While our asymptotic results unify the inference with non-divergent and divergent thresholds, the finite sample studies via simulation and application to real data show that the derived confidence intervals well cover the true VaR in insurance and finance.

11.
Transl Res ; 240: 87-98, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-34743014

RESUMEN

Appropriate screening tool for excessive alcohol use (EAU) is clinically important as it may help providers encourage early intervention and prevent adverse outcomes. We hypothesized that patients with excessive alcohol use will have distinct serum metabolites when compared to healthy controls. Serum metabolic profiling of 22 healthy controls and 147 patients with a history of EAU was performed. We employed seemingly unrelated regression to identify the unique metabolites and found 67 metabolites (out of 556), which were differentially expressed in patients with EAU. Sixteen metabolites belong to the sphingolipid metabolism, 13 belong to phospholipid metabolism, and the remaining 38 were metabolites of 25 different pathways. We also found 93 serum metabolites that were significantly associated with the total quantity of alcohol consumption in the last 30 days. A total of 15 metabolites belong to the sphingolipid metabolism, 11 belong to phospholipid metabolism, and 7 metabolites belong to lysolipid. Using a Venn diagram approach, we found the top 10 metabolites with differentially expressed in EAU and significantly associated with the quantity of alcohol consumption, sphingomyelin (d18:2/18:1), sphingomyelin (d18:2/21:0,d16:2/23:0), guanosine, S-methylmethionine, 10-undecenoate (11:1n1), sphingomyelin (d18:1/20:1, d18:2/20:0), sphingomyelin (d18:1/17:0, d17:1/18:0, d19:1/16:0), N-acetylasparagine, sphingomyelin (d18:1/19:0, d19:1/18:0), and 1-palmitoyl-2-palmitoleoyl-GPC (16:0/16:1). The diagnostic performance of the top 10 metabolites, using the area under the ROC curve, was significantly higher than that of commonly used markers. We have identified a unique metaboloic signature among patients with EAU. Future studies to validate and determine the kinetics of these markers as a function of alcohol consumption are needed.


Asunto(s)
Consumo de Bebidas Alcohólicas/sangre , Consumo de Bebidas Alcohólicas/metabolismo , Metaboloma , Metabolómica , Adulto , Estudios de Casos y Controles , Estudios de Cohortes , Femenino , Humanos , Modelos Lineales , Masculino , Redes y Vías Metabólicas , Curva ROC
12.
Hum Hered ; 67(3): 154-62, 2009.
Artículo en Inglés | MEDLINE | ID: mdl-19077433

RESUMEN

OBJECTIVE: Identifying genotyping errors is an important issue in genetic research, yet it has been relatively less studied in samples consisting of unrelated individuals. In this article, we consider several models of genotyping errors, which were originally proposed for pedigree data, for unrelated population samples with single nucleotide polymorphism (SNP) genotype data. The mathematical constraints are investigated for detecting genotyping errors without resampling replicates or genotyping relatives. METHODS: For the various proposed genotyping error models, we unveil the conditions under which the parameters are identifiable. These results are verified through applications to simulated and real SNP data. RESULTS: We show that, with constraints, two particular models provide both identifiable error rate and allele frequencies of an SNP for unrelated population data. The simulation study shows that these two models present unbiased estimates for the allele frequencies. One of the models also gives an unbiased estimate for the genotyping error rate. CONCLUSION: While the Hardy-Weinberg equilibrium test can be used to detect genotyping errors, a key advantage of these models is the explicit estimates of genotyping error rates and allele frequencies. This work may help researchers to estimate error rates and to use the estimates in their analysis to increase power and decrease bias, without the extra work of genotyping family members or replicates.


Asunto(s)
Genotipo , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Simulación por Computador , Frecuencia de los Genes , Humanos , Análisis de Secuencia de ADN
13.
J Comput Biol ; 27(7): 1171-1179, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-31692371

RESUMEN

Logistic regression is an effective tool in case-control analysis. With the advanced high throughput technology, a quest to seek a fast and efficient method in fitting high-dimensional logistic regression has gained much interest. An empirical Bayes model for logistic regression is considered in this article. A spike-and-slab prior is used for variable selection purpose, which plays a vital role in building an effective predictive model while making model interpretable. To increase the power of variable selection, we incorporate biological knowledge through the Ising prior. The development of the iterated conditional modes/medians (ICM/M) algorithm is proposed to fit the logistic model that has computational advantage over Markov Chain Monte Carlo (MCMC) algorithms. The implementation of the ICM/M algorithm for both linear and logistic models can be found in R package icmm that is freely available on Comprehensive R Archive Network (CRAN). Simulation studies were carried out to assess the performances of our method, with lasso and adaptive lasso as benchmark. Overall, the simulation studies show that the ICM/M outperform the others in terms of number of false positives and have competitive predictive ability. An application to a real data set from Parkinson's disease study was also carried out for illustration. To identify important variables, our approach provides flexibility to select variables based on local posterior probabilities while controlling false discovery rate at a desired level rather than relying only on regression coefficients.


Asunto(s)
Algoritmos , Estudios de Casos y Controles , Genómica/estadística & datos numéricos , Enfermedad de Parkinson/genética , Teorema de Bayes , Frecuencia de los Genes , Redes Reguladoras de Genes , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Modelos Logísticos , Cadenas de Markov , Polimorfismo de Nucleótido Simple
14.
Methods Mol Biol ; 2037: 471-482, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31463861

RESUMEN

Despite the increasing popularity and applicability of metabolomics for putative biomarker identification, analysis of the data is challenged by low statistical power resulting from the small sample sizes and large numbers of metabolites and other omics information, as well as confounding demographic and clinical variables. To enhance the statistical power and improve reproducibility of the identified metabolite-based biomarkers, we advocate the use of advanced statistical methods that can simultaneously evaluate the relationship between a group of metabolites and various types of variables including other omics profiles, demographic and clinical data, as well as the complex interactions between them. Accordingly, in this chapter, we describe the method of seemingly unrelated regression that can simultaneously analyze multiple metabolites while controlling the confounding effects of demographic and clinical variables (such as gender, age, BMI, smoking status). We also introduce penalized orthogonal components regression as a screening approach that can handle millions of omics predictors in the model.


Asunto(s)
Biomarcadores de Tumor/análisis , Neoplasias Colorrectales/diagnóstico , Neoplasias Colorrectales/metabolismo , Interpretación Estadística de Datos , Espectroscopía de Resonancia Magnética/métodos , Redes y Vías Metabólicas , Metabolómica/métodos , Anciano , Femenino , Humanos , Masculino , Persona de Mediana Edad
15.
Sci Rep ; 9(1): 1197, 2019 02 04.
Artículo en Inglés | MEDLINE | ID: mdl-30718595

RESUMEN

Constructing gene regulatory networks is crucial to unraveling the genetic architecture of complex traits and to understanding the mechanisms of diseases. On the basis of gene expression and single nucleotide polymorphism data in the yeast, Saccharomyces cerevisiae, we constructed gene regulatory networks using a two-stage penalized least squares method. A large system of structural equations via optimal prediction of a set of surrogate variables was established at the first stage, followed by consistent selection of regulatory effects at the second stage. Using this approach, we identified subnetworks that were enriched in gene ontology categories, revealing directional regulatory mechanisms controlling these biological pathways. Our mapping and analysis of expression-based quantitative trait loci uncovered a known alteration of gene expression within a biological pathway that results in regulatory effects on companion pathway genes in the phosphocholine network. In addition, we identify nodes in these gene ontology-enriched subnetworks that are coordinately controlled by transcription factors driven by trans-acting expression quantitative trait loci. Altogether, the integration of documented transcription factor regulatory associations with subnetworks defined by a system of structural equations using quantitative trait loci data is an effective means to delineate the transcriptional control of biological pathways.


Asunto(s)
Redes Reguladoras de Genes/genética , Saccharomyces cerevisiae/genética , Análisis de Secuencia de ADN/métodos , Mapeo Cromosómico/métodos , Expresión Génica/genética , Regulación de la Expresión Génica/genética , Ontología de Genes , Análisis de los Mínimos Cuadrados , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Factores de Transcripción/genética
16.
BMC Bioinformatics ; 9: 251, 2008 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-18510743

RESUMEN

BACKGROUND: Identifying quantitative trait loci (QTL) for both additive and epistatic effects raises the statistical issue of selecting variables from a large number of candidates using a small number of observations. Missing trait and/or marker values prevent one from directly applying the classical model selection criteria such as Akaike's information criterion (AIC) and Bayesian information criterion (BIC). RESULTS: We propose a two-step Bayesian variable selection method which deals with the sparse parameter space and the small sample size issues. The regression coefficient priors are flexible enough to incorporate the characteristic of "large p small n" data. Specifically, sparseness and possible asymmetry of the significant coefficients are dealt with by developing a Gibbs sampling algorithm to stochastically search through low-dimensional subspaces for significant variables. The superior performance of the approach is demonstrated via simulation study. We also applied it to real QTL mapping datasets. CONCLUSION: The two-step procedure coupled with Bayesian classification offers flexibility in modeling "large p small n" data, especially for the sparse and asymmetric parameter space. This approach can be extended to other settings characterized by high dimension and low sample size.


Asunto(s)
Mapeo Cromosómico/métodos , Epistasis Genética , Modelos Genéticos , Sitios de Carácter Cuantitativo/genética , Análisis de Regresión , Simulación por Computador , Interpretación Estadística de Datos , Bases de Datos Genéticas , Modelos Estadísticos , Tamaño de la Muestra
17.
Anal Chem ; 80(8): 2664-71, 2008 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-18351753

RESUMEN

A two-dimensional (2-D) correlation optimized warping (COW) algorithm has been developed to align 2-D gas chromatography coupled with time-of-flight mass spectrometry (GC x GC/TOF-MS) data. By partitioning raw chromatographic profiles and warping the grid points simultaneously along the first and second dimensions on the basis of applying a one-dimensional COW algorithm to characteristic vectors, nongrid points can be interpolatively warped. This 2-D algorithm was directly applied to total ion counts (TIC) chromatographic profiles of homogeneous chemical samples, i.e., samples including mostly identical compounds. For heterogeneous chemical samples, the 2-D algorithm is first applied to certain selected ion counts chromatographic profiles, and the resultant warping parameters are then used to warp the corresponding TIC chromatographic profiles. The developed 2-D COW algorithm can also be applied to align other 2-D separation images, e.g., LC x LC data, LC x GC data, GC x GC data, LC x CE data, and CE x CE data.


Asunto(s)
Algoritmos , Cromatografía de Gases y Espectrometría de Masas/métodos , Aminoácidos/análisis , Aminoácidos/química , Ácidos Grasos/análisis , Ácidos Grasos/química
18.
Theor Biol Med Model ; 4: 3, 2007 Jan 19.
Artículo en Inglés | MEDLINE | ID: mdl-17239251

RESUMEN

BACKGROUND: It is of particular interest to identify cancer-specific molecular signatures for early diagnosis, monitoring effects of treatment and predicting patient survival time. Molecular information about patients is usually generated from high throughput technologies such as microarray and mass spectrometry. Statistically, we are challenged by the large number of candidates but only a small number of patients in the study, and the right-censored clinical data further complicate the analysis. RESULTS: We present a two-stage procedure to profile molecular signatures for survival outcomes. Firstly, we group closely-related molecular features into linkage clusters, each portraying either similar or opposite functions and playing similar roles in prognosis; secondly, a Bayesian approach is developed to rank the centroids of these linkage clusters and provide a list of the main molecular features closely related to the outcome of interest. A simulation study showed the superior performance of our approach. When it was applied to data on diffuse large B-cell lymphoma (DLBCL), we were able to identify some new candidate signatures for disease prognosis. CONCLUSION: This multivariate approach provides researchers with a more reliable list of molecular features profiled in terms of their prognostic relationship to the event times, and generates dependable information for subsequent identification of prognostic molecular signatures through either biological procedures or further data analysis.


Asunto(s)
Teorema de Bayes , Linfoma de Células B/genética , Linfoma de Células B Grandes Difuso/genética , Humanos , Linfoma de Células B/mortalidad , Linfoma de Células B Grandes Difuso/mortalidad , Matemática , Análisis Multivariante , Pronóstico , Modelos de Riesgos Proporcionales
19.
Genetics ; 169(4): 2305-18, 2005 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-15520261

RESUMEN

We developed a classification approach to multiple quantitative trait loci (QTL) mapping built upon a Bayesian framework that incorporates the important prior information that most genotypic markers are not cotransmitted with a QTL or their QTL effects are negligible. The genetic effect of each marker is modeled using a three-component mixture prior with a class for markers having negligible effects and separate classes for markers having positive or negative effects on the trait. The posterior probability of a marker's classification provides a natural statistic for evaluating credibility of identified QTL. This approach performs well, especially with a large number of markers but a relatively small sample size. A heat map to visualize the results is proposed so as to allow investigators to be more or less conservative when identifying QTL. We validated the method using a well-characterized data set for barley heading values from the North American Barley Genome Mapping Project. Application of the method to a new data set revealed sex-specific QTL underlying differences in glucose-6-phosphate dehydrogenase enzyme activity between two Drosophila species. A simulation study demonstrated the power of this approach across levels of trait heritability and when marker data were sparse.


Asunto(s)
Modelos Genéticos , Sitios de Carácter Cuantitativo , Animales , Teorema de Bayes , Mapeo Cromosómico , Drosophila , Marcadores Genéticos , Glucosafosfato Deshidrogenasa/genética , Hordeum/genética , Modelos Lineales , Modelos Estadísticos , Análisis Multivariante , Especificidad de la Especie , Estadística como Asunto/métodos , Factores de Tiempo
20.
PLoS One ; 11(6): e0155758, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27299523

RESUMEN

Many genetic variants have been linked to familial or sporadic Parkinson's disease (PD), among which those identified in PARK16, BST1, SNCA, LRRK2, GBA and MAPT genes have been demonstrated to be the most common risk factors worldwide. Moreover, complex gene-gene and gene-environment interactions have been highlighted in PD pathogenesis. Compared to studies focusing on the predisposing effects of genes, there is a relative lack of research investigating how these genes and their interactions influence the clinical profiles of PD. In a cohort consisting of 2,011 Chinese Han PD patients, we selected 9 representative variants from the 6 above-mentioned common PD genes to analyze their main and epistatic effects on the Unified Parkinson's Disease Rating Scale (UPDRS) and the Hoehn and Yahr (H-Y) stage of PD. With multiple linear regression models adjusting for medication status, disease duration, gender and age at onset, none of the variants displayed significant main effects on UPDRS or the H-Y scores. However, for gene-gene interaction analyses, 7 out of 37 pairs of variants showed significant or marginally significant associations with these scores. Among these, the GBA rs421016 (L444P)×LRRK2 rs33949390 (R1628P) interaction was consistently significant in relation to UPDRS III and UPDRS total (I+II+III), even after controlling for the family-wise error rate using False Discovery Rate (FDR-corrected p values are 0.0481 and 0.0070, respectively). Although the effects of the remaining pairs of variants did not survive the FDR correction, they showed marginally significant associations with either UPDRS or the H-Y stage (raw p<0.05). Our results highlight the importance of epistatic effects of multiple genes on the determination of PD clinical profiles and may have implications for molecular classification and personalized intervention of the disease.


Asunto(s)
Variación Genética , Enfermedad de Parkinson/genética , Anciano , Estudios de Cohortes , Epistasis Genética , Femenino , Interacción Gen-Ambiente , Humanos , Masculino , Persona de Mediana Edad , Enfermedad de Parkinson/epidemiología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA