Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 73
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nat Methods ; 21(3): 391-400, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38374264

RESUMEN

Deciphering cell-type heterogeneity is crucial for systematically understanding tissue homeostasis and its dysregulation in diseases. Computational deconvolution is an efficient approach for estimating cell-type abundances from a variety of omics data. Despite substantial methodological progress in computational deconvolution in recent years, challenges are still outstanding. Here we enlist four important challenges related to computational deconvolution: the quality of the reference data, generation of ground truth data, limitations of computational methodologies, and benchmarking design and implementation. Finally, we make recommendations on reference data generation, new directions of computational methodologies, and strategies to promote rigorous benchmarking.


Asunto(s)
Biología Computacional , Genómica , Biología Computacional/métodos , Benchmarking
2.
Cell ; 151(1): 138-52, 2012 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-23021221

RESUMEN

Inflammation and macrophage foam cells are characteristic features of atherosclerotic lesions, but the mechanisms linking cholesterol accumulation to inflammation and LXR-dependent response pathways are poorly understood. To investigate this relationship, we utilized lipidomic and transcriptomic methods to evaluate the effect of diet and LDL receptor genotype on macrophage foam cell formation within the peritoneal cavities of mice. Foam cell formation was associated with significant changes in hundreds of lipid species and unexpected suppression, rather than activation, of inflammatory gene expression. We provide evidence that regulated accumulation of desmosterol underlies many of the homeostatic responses, including activation of LXR target genes, inhibition of SREBP target genes, selective reprogramming of fatty acid metabolism, and suppression of inflammatory-response genes, observed in macrophage foam cells. These observations suggest that macrophage activation in atherosclerotic lesions results from extrinsic, proinflammatory signals generated within the artery wall that suppress homeostatic and anti-inflammatory functions of desmosterol.


Asunto(s)
Aterosclerosis/inmunología , Colesterol/biosíntesis , Desmosterol/metabolismo , Células Espumosas/metabolismo , Metabolismo de los Lípidos , Transcriptoma , Animales , Aterosclerosis/metabolismo , Colesterol/análogos & derivados , Colesterol/metabolismo , Ácidos Grasos/metabolismo , Células Espumosas/inmunología , Técnicas de Silenciamiento del Gen , Leucocitos Mononucleares/metabolismo , Masculino , Ratones , Ratones Endogámicos C57BL , Receptores de LDL/genética , Receptores de LDL/metabolismo , Proteínas de Unión a los Elementos Reguladores de Esteroles/metabolismo
3.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34974623

RESUMEN

Motif discovery and characterization are important for gene regulation analysis. The lack of intuitive and integrative web servers impedes the effective use of motifs. Most motif discovery web tools are either not designed for non-expert users or lacking optimization steps when using default settings. Here we describe bipartite motifs learning (BML), a parameter-free web server that provides a user-friendly portal for online discovery and analysis of sequence motifs, using high-throughput sequencing data as the input. BML utilizes both position weight matrix and dinucleotide weight matrix, the latter of which enables the expression of the interdependencies of neighboring bases. With input parameters concerning the motifs are given, the BML achieves significantly higher accuracy than other available tools for motif finding. When no parameters are given by non-expert users, unlike other tools, BML employs a learning method to identify motifs automatically and achieve accuracy comparable to the scenario where the parameters are set. The BML web server is freely available at http://motif.t-ridership.com/ (https://github.com/Mohammad-Vahed/BML).


Asunto(s)
Motivos de Nucleótidos , Programas Informáticos , Factores de Transcripción/metabolismo , Navegador Web , Algoritmos , Arabidopsis , Sitios de Unión , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Posición Específica de Matrices de Puntuación , Análisis de Secuencia de ADN
4.
BJOG ; 2023 Nov 20.
Artículo en Inglés | MEDLINE | ID: mdl-37984426

RESUMEN

OBJECTIVES: To identify and internally validate metabolites predictive of spontaneous preterm birth (sPTB) using multiple machine learning methods and sequential maternal serum samples, and to predict spontaneous early term birth (sETB) using these metabolites. DESIGN: Case-cohort design within a prospective cohort study. SETTING: Cambridge, UK. POPULATION OR SAMPLE: A total of 399 Pregnancy Outcome Prediction study participants, including 98 cases of sPTB. METHODS: An untargeted metabolomic analysis of maternal serum samples at 12, 20, 28 and 36 weeks of gestation was performed. We applied six supervised machine learning methods and a weighted Cox model to measurements at 28 weeks of gestation and sPTB, followed by feature selection. We used logistic regression with elastic net penalty, followed by best subset selection, to reduce the number of predictive metabolites further. We applied coefficients from the chosen models to measurements from different gestational ages to predict sPTB and sETB. MAIN OUTCOME MEASURES: sPTB and sETB. RESULTS: We identified 47 metabolites, mostly lipids, as important predictors of sPTB by two or more methods and 22 were identified by three or more methods. The best 4-predictor model had an optimism-corrected area under the receiver operating characteristics curve (AUC) of 0.703 at 28 weeks of gestation. The model also predicted sPTB in 12-week samples (0.606, 95% CI 0.544-0.667) and 20-week samples (0.657, 95% CI 0.597-0.717) and it predicted sETB in 36-week samples (0.727, 95% CI 0.606-0.849). A lysolipid, 1-palmitoleoyl-GPE (16:1)*, was the strongest predictor of sPTB at 12 weeks of gestation (0.609, 95% CI 0.548-0.670), 20 weeks (0.630, 95% CI 0.569-0.690) and 28 weeks (0.660, 95% CI 0.599-0.722), and of sETB at 36 weeks (0.739, 95% CI 0.618-0.860). CONCLUSIONS: We identified and internally validated maternal serum metabolites predictive of sPTB. A lysolipid, 1-palmitoleoyl-GPE (16:1)*, is a novel predictor of sPTB and sETB. Further validation in external populations is required.

5.
Bioinformatics ; 37(17): 2772-2774, 2021 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-33515235

RESUMEN

SUMMARY: Cox-nnet is a neural-network-based prognosis prediction method, originally applied to genomics data. Here, we propose the version 2 of Cox-nnet, with significant improvement on efficiency and interpretability, making it suitable to predict prognosis based on large-scale population data, including those electronic medical records (EMR) datasets. We also add permutation-based feature importance scores and the direction of feature coefficients. When applied on a kidney transplantation dataset, Cox-nnet v2.0 reduces the training time of Cox-nnet up to 32-folds (n =10 000) and achieves better prediction accuracy than Cox-PH (P<0.05). It also achieves similarly superior performance on a publicly available SUPPORT data (n=8000). The high efficiency and accuracy make Cox-nnet v2.0 a desirable method for survival prediction in large-scale EMR data. AVAILABILITY AND IMPLEMENTATION: Cox-nnet v2.0 is freely available to the public at https://github.com/lanagarmire/Cox-nnet-v2.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

6.
FASEB J ; 35(4): e21524, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33742690

RESUMEN

Maternal pre-pregnancy obesity may have an impact on both maternal and fetal health. We examined the microbiome recovered from placentas in a multi-ethnic maternal pre-pregnant obesity cohort, through an optimized microbiome protocol to enrich low bacterial biomass samples. We found that the microbiomes recovered from the placentas of obese pre-pregnant mothers are less abundant and less diverse when compared to those from mothers of normal pre-pregnancy weight. Microbiome richness also decreases from the maternal side to the fetal side, demonstrating heterogeneity by geolocation within the placenta. In summary, our study shows that the microbiomes recovered from the placentas are associated with pre-pregnancy obesity. IMPORTANCE: Maternal pre-pregnancy obesity may have an impact on both maternal and fetal health. The placenta is an important organ at the interface of the mother and fetus, and supplies nutrients to the fetus. We report that the microbiomes enriched from the placentas of obese pre-pregnant mothers are less abundant and less diverse when compared to those from mothers of normal pre-pregnancy weight. More over, the microbiomes also vary by geolocation within the placenta.


Asunto(s)
Microbiota/fisiología , Obesidad Materna/metabolismo , Obesidad/complicaciones , Placenta/metabolismo , Adulto , Estudios de Cohortes , Femenino , Desarrollo Fetal/fisiología , Humanos , Embarazo , Complicaciones del Embarazo/etiología
7.
J Lipid Res ; 62: 100118, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34547287

RESUMEN

Preeclampsia is a pregnancy-specific syndrome characterized by hypertension and proteinuria after 20 weeks of gestation. However, it is not well understood what lipids are involved in the development of this condition, and even less is known how these lipids mediate its formation. To reveal the relationship between lipids and preeclampsia, we conducted lipidomic profiling of maternal sera of 44 severe preeclamptic and 20 healthy pregnant women from a multiethnic cohort in Hawaii. Correlation network analysis showed that oxidized phospholipids have increased intercorrelations and connections in preeclampsia, whereas other lipids, including triacylglycerols, have reduced network correlations and connections. A total of 10 lipid species demonstrate significant changes uniquely associated with preeclampsia but not any other clinical confounders. These species are from the lipid classes of lysophosphatidylcholines, phosphatidylcholines (PCs), cholesteryl esters, phosphatidylethanolamines, lysophosphatidylethanolamines, and ceramides. A random forest classifier built on these lipids shows highly accurate and specific prediction (F1 statistic = 0.94; balanced accuracy = 0.88) of severe preeclampsia, demonstrating their potential as biomarkers for this condition. These lipid species are enriched in dysregulated biological pathways, including insulin signaling, immune response, and phospholipid metabolism. Moreover, causality inference shows that various PCs and lysophosphatidylcholines mediate severe preeclampsia through PC 35:1e. Our results suggest that the lipidome may play a role in the pathogenesis and serve as biomarkers of severe preeclampsia.


Asunto(s)
Lipidómica , Lípidos/sangre , Preeclampsia/sangre , Adulto , Estudios de Cohortes , Femenino , Humanos , Embarazo , Índice de Severidad de la Enfermedad
8.
Trends Genet ; 34(10): 790-805, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30143323

RESUMEN

Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.


Asunto(s)
Interpretación Estadística de Datos , Genómica/estadística & datos numéricos , Proteómica/estadística & datos numéricos , Algoritmos , Humanos , Biología de Sistemas/estadística & datos numéricos
9.
Nature ; 523(7559): 221-5, 2015 Jul 09.
Artículo en Inglés | MEDLINE | ID: mdl-25924064

RESUMEN

Inflammation is a beneficial host response to infection but can contribute to inflammatory disease if unregulated. The Th17 lineage of T helper (Th) cells can cause severe human inflammatory diseases. These cells exhibit both instability (they can cease to express their signature cytokine, IL-17A) and plasticity (they can start expressing cytokines typical of other lineages) upon in vitro re-stimulation. However, technical limitations have prevented the transcriptional profiling of pre- and post-conversion Th17 cells ex vivo during immune responses. Thus, it is unknown whether Th17 cell plasticity merely reflects change in expression of a few cytokines, or if Th17 cells physiologically undergo global genetic reprogramming driving their conversion from one T helper cell type to another, a process known as transdifferentiation. Furthermore, although Th17 cell instability/plasticity has been associated with pathogenicity, it is unknown whether this could present a therapeutic opportunity, whereby formerly pathogenic Th17 cells could adopt an anti-inflammatory fate. Here we used two new fate-mapping mouse models to track Th17 cells during immune responses to show that CD4(+) T cells that formerly expressed IL-17A go on to acquire an anti-inflammatory phenotype. The transdifferentiation of Th17 into regulatory T cells was illustrated by a change in their signature transcriptional profile and the acquisition of potent regulatory capacity. Comparisons of the transcriptional profiles of pre- and post-conversion Th17 cells also revealed a role for canonical TGF-ß signalling and consequently for the aryl hydrocarbon receptor (AhR) in conversion. Thus, Th17 cells transdifferentiate into regulatory cells, and contribute to the resolution of inflammation. Our data suggest that Th17 cell instability and plasticity is a therapeutic opportunity for inflammatory diseases.


Asunto(s)
Transdiferenciación Celular , Linfocitos T Reguladores/citología , Linfocitos T Reguladores/inmunología , Células Th17/citología , Células Th17/inmunología , Animales , Femenino , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Helmintiasis/inmunología , Masculino , Ratones , Nippostrongylus/inmunología , Infecciones Estafilocócicas/inmunología , Staphylococcus aureus/inmunología
10.
J Proteome Res ; 19(4): 1361-1374, 2020 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-31975597

RESUMEN

Maternal obesity has become a growing global health concern that may predispose the offspring to medical conditions later in life. However, the metabolic link between maternal prepregnant obesity and healthy offspring has not yet been fully elucidated. In this study, we conducted a case-control study using a coupled untargeted and targeted metabolomic approach from the newborn cord blood metabolomes associated with a matched maternal prepregnant obesity cohort of 28 cases and 29 controls. The subjects were recruited from multiethnic populations in Hawaii, including rarely reported Native Hawaiian and other Pacific Islanders (NHPI). We found that maternal obesity was the most important factor contributing to differences in cord blood metabolomics. Using an elastic net regularization-based logistic regression model, we identified 29 metabolites as potential early-life biomarkers manifesting intrauterine effect of maternal obesity, with accuracy as high as 0.947 after adjusting for clinical confounding (maternal and paternal age, ethnicity, parity, and gravidity). We validated the model results in a subsequent set of samples (N = 30) with an accuracy of 0.822. Among the metabolites, six metabolites (galactonic acid, butenylcarnitine, 2-hydroxy-3-methylbutyric acid, phosphatidylcholine diacyl C40:3, 1,5-anhydrosorbitol, and phosphatidylcholine acyl-alkyl 40:3) were individually and significantly different between the maternal obese and normal-weight groups. Interestingly, hydroxy-3-methylbutyric acid showed significantly higher levels in cord blood from the NHPI group compared to that from Asian and Caucasian groups. In summary, significant associations were observed between maternal prepregnant obesity and offspring metabolomic alternation at birth, revealing the intergenerational impact of maternal obesity.


Asunto(s)
Sangre Fetal , Madres , Peso al Nacer , Índice de Masa Corporal , Estudios de Casos y Controles , Femenino , Humanos , Recién Nacido , Metabolómica , Obesidad , Embarazo
11.
Reproduction ; 160(6): R155-R167, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33112783

RESUMEN

Human placenta is a complex and heterogeneous organ interfacing between the mother and the fetus that supports fetal development. Alterations to placental structural components are associated with various pregnancy complications. To reveal the heterogeneity among various placenta cell types in normal and diseased placentas, as well as elucidate molecular interactions within a population of placental cells, a new genomics technology called single cell RNA-seq (or scRNA-seq) has been employed in the last couple of years. Here we review the principles of scRNA-seq technology, and summarize the recent human placenta studies at scRNA-seq level across gestational ages as well as in pregnancy complications, such as preterm birth and preeclampsia. We list the computational analysis platforms and resources available for the public use. Lastly, we discuss the future areas of interest for placenta single cell studies, as well as the data analytics needed to accomplish them.


Asunto(s)
Desarrollo Fetal , Perfilación de la Expresión Génica , Regulación del Desarrollo de la Expresión Génica , Placenta/metabolismo , Complicaciones del Embarazo/genética , Análisis de la Célula Individual/métodos , Femenino , Edad Gestacional , Humanos , Placenta/citología , Embarazo , Complicaciones del Embarazo/patología
12.
Genes Dev ; 26(23): 2567-79, 2012 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-23152446

RESUMEN

Tight control over the segregation of endoderm, mesoderm, and ectoderm is essential for normal embryonic development of all species, yet how neighboring embryonic blastomeres can contribute to different germ layers has never been fully explained. We postulated that microRNAs, which fine-tune many biological processes, might modulate the response of embryonic blastomeres to growth factors and other signals that govern germ layer fate. A systematic screen of a whole-genome microRNA library revealed that the let-7 and miR-18 families increase mesoderm at the expense of endoderm in mouse embryonic stem cells. Both families are expressed in ectoderm and mesoderm, but not endoderm, as these tissues become distinct during mouse and frog embryogenesis. Blocking let-7 function in vivo dramatically affected cell fate, diverting presumptive mesoderm and ectoderm into endoderm. siRNA knockdown of computationally predicted targets followed by mutational analyses revealed that let-7 and miR-18 down-regulate Acvr1b and Smad2, respectively, to attenuate Nodal responsiveness and bias blastomeres to ectoderm and mesoderm fates. These findings suggest a crucial role for the let-7 and miR-18 families in germ layer specification and reveal a remarkable conservation of function from amphibians to mammals.


Asunto(s)
Desarrollo Embrionario/genética , Regulación del Desarrollo de la Expresión Génica , Genoma/genética , Estratos Germinativos/embriología , MicroARNs/metabolismo , Animales , Células Cultivadas , Análisis Mutacional de ADN , Células Madre Embrionarias , Técnicas de Silenciamiento del Gen , Ratones , MicroARNs/genética , Xenopus laevis
13.
PLoS Comput Biol ; 14(4): e1006076, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29634719

RESUMEN

Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet.


Asunto(s)
Perfilación de la Expresión Génica/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Redes Neurales de la Computación , Pronóstico , Modelos de Riesgos Proporcionales , Biología Computacional , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Femenino , Redes Reguladoras de Genes , Humanos , Estimación de Kaplan-Meier , Masculino , Redes y Vías Metabólicas/genética , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/mortalidad , Análisis de Secuencia de ARN/estadística & datos numéricos , Análisis de Supervivencia
14.
J Proteome Res ; 17(1): 337-347, 2018 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-29110491

RESUMEN

Metabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+), and 67 negative estrogen receptor (ER-) to test the accuracies of feed-forward networks, a deep learning (DL) framework, as well as six widely used machine learning models, namely random forest (RF), support vector machines (SVM), recursive partitioning and regression trees (RPART), linear discriminant analysis (LDA), prediction analysis for microarrays (PAM), and generalized boosted models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER- patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value <0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion and absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accuracy (AUC = 0.93) and better revelation of disease biology. We encourage the adoption of feed-forward networks based deep learning method in the metabolomics research community for classification.


Asunto(s)
Neoplasias de la Mama/clasificación , Aprendizaje Automático/normas , Metabolómica/métodos , Receptores de Estrógenos/análisis , Área Bajo la Curva , Femenino , Humanos
16.
RNA ; 20(11): 1684-96, 2014 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-25246651

RESUMEN

It is crucial for researchers to optimize RNA-seq experimental designs for differential expression detection. Currently, the field lacks general methods to estimate power and sample size for RNA-Seq in complex experimental designs, under the assumption of the negative binomial distribution. We simulate RNA-Seq count data based on parameters estimated from six widely different public data sets (including cell line comparison, tissue comparison, and cancer data sets) and calculate the statistical power in paired and unpaired sample experiments. We comprehensively compare five differential expression analysis packages (DESeq, edgeR, DESeq2, sSeq, and EBSeq) and evaluate their performance by power, receiver operator characteristic (ROC) curves, and other metrics including areas under the curve (AUC), Matthews correlation coefficient (MCC), and F-measures. DESeq2 and edgeR tend to give the best performance in general. Increasing sample size or sequencing depth increases power; however, increasing sample size is more potent than sequencing depth to increase power, especially when the sequencing depth reaches 20 million reads. Long intergenic noncoding RNAs (lincRNA) yields lower power relative to the protein coding mRNAs, given their lower expression level in the same RNA-Seq experiment. On the other hand, paired-sample RNA-Seq significantly enhances the statistical power, confirming the importance of considering the multifactor experimental design. Finally, a local optimal power is achievable for a given budget constraint, and the dominant contributing factor is sample size rather than the sequencing depth. In conclusion, we provide a power analysis tool (http://www2.hawaii.edu/~lgarmire/RNASeqPowerCalculator.htm) that captures the dispersion in the data and can serve as a practical reference under the budget constraint of RNA-Seq experiments.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN Mensajero/química , Análisis de Secuencia de ARN/métodos , Animales , Simulación por Computador , Bases de Datos Genéticas , Humanos , Ratones , Análisis de Regresión , Reproducibilidad de los Resultados , Proyectos de Investigación , Tamaño de la Muestra , Programas Informáticos
17.
BMC Bioinformatics ; 16 Suppl 5: S10, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25861082

RESUMEN

BACKGROUND: Epigenetic alterations are known to correlate with changes in gene expression among various diseases including cancers. However, quantitative models that accurately predict the up or down regulation of gene expression are currently lacking. METHODS: A new machine learning-based method of gene expression prediction is developed in the context of lung cancer. This method uses the Illumina Infinium HumanMethylation450K Beadchip CpG methylation array data from paired lung cancer and adjacent normal tissues in The Cancer Genome Atlas (TCGA) and histone modification marker CHIP-Seq data from the ENCODE project, to predict the differential expression of RNA-Seq data in TCGA lung cancers. It considers a comprehensive list of 1424 features spanning the four categories of CpG methylation, histone H3 methylation modification, nucleotide composition, and conservation. Various feature selection and classification methods are compared to select the best model over 10-fold cross-validation in the training data set. RESULTS: A best model comprising 67 features is chosen by ReliefF based feature selection and random forest classification method, with AUC = 0.864 from the 10-fold cross-validation of the training set and AUC = 0.836 from the testing set. The selected features cover all four data types, with histone H3 methylation modification (32 features) and CpG methylation (15 features) being most abundant. Among the dropping-off tests of individual data-type based features, removal of CpG methylation feature leads to the most reduction in model performance. In the best model, 19 selected features are from the promoter regions (TSS200 and TSS1500), highest among all locations relative to transcripts. Sequential dropping-off of CpG methylation features relative to different regions on the protein coding transcripts shows that promoter regions contribute most significantly to the accurate prediction of gene expression. CONCLUSIONS: By considering a comprehensive list of epigenomic and genomic features, we have constructed an accurate model to predict transcriptomic differential expression, exemplified in lung cancer.


Asunto(s)
Islas de CpG/genética , Metilación de ADN , Epigenómica/métodos , Regulación Neoplásica de la Expresión Génica , Genoma Humano , Neoplasias Pulmonares/genética , Inteligencia Artificial , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Histonas/genética , Humanos
18.
PLoS Comput Biol ; 10(9): e1003851, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25233347

RESUMEN

Breast cancer is the most common malignancy in women worldwide. With the increasing awareness of heterogeneity in breast cancers, better prediction of breast cancer prognosis is much needed for more personalized treatment and disease management. Towards this goal, we have developed a novel computational model for breast cancer prognosis by combining the Pathway Deregulation Score (PDS) based pathifier algorithm, Cox regression and L1-LASSO penalization method. We trained the model on a set of 236 patients with gene expression data and clinical information, and validated the performance on three diversified testing data sets of 606 patients. To evaluate the performance of the model, we conducted survival analysis of the dichotomized groups, and compared the areas under the curve based on the binary classification. The resulting prognosis genomic model is composed of fifteen pathways (e.g., P53 pathway) that had previously reported cancer relevance, and it successfully differentiated relapse in the training set (log rank p-value = 6.25e-12) and three testing data sets (log rank p-value < 0.0005). Moreover, the pathway-based genomic models consistently performed better than gene-based models on all four data sets. We also find strong evidence that combining genomic information with clinical information improved the p-values of prognosis prediction by at least three orders of magnitude in comparison to using either genomic or clinical information alone. In summary, we propose a novel prognosis model that harnesses the pathway-based dysregulation as well as valuable clinical information. The selected pathways in our prognosis model are promising targets for therapeutic intervention.


Asunto(s)
Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Genómica/métodos , Modelos Estadísticos , Algoritmos , Neoplasias de la Mama/metabolismo , Progresión de la Enfermedad , Femenino , Perfilación de la Expresión Génica , Humanos , Persona de Mediana Edad , Pronóstico , Modelos de Riesgos Proporcionales , Curva ROC , Transcriptoma
19.
Mol Hum Reprod ; 20(9): 885-904, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-24944161

RESUMEN

Pre-eclampsia is the leading cause of fetal and maternal morbidity and mortality. Early onset pre-eclampsia (EOPE) is a disorder that has severe maternal and fetal outcomes, whilst its etiology is poorly understood. We hypothesize that epigenetics plays an important role to mediate the development of EOPE and conducted a case-control study to compare the genome-wide methylome difference between chorioamniotic membranes from 30 EOPE and 17 full-term pregnancies using the Infinium Human Methylation 450 BeadChip arrays. Bioinformatics analysis tested differential methylation (DM) at CpG site level, gene level, and pathway and network level. A striking genome-wide hypermethylation pattern coupled with hypomethylation in promoters was observed. Out of 385 184 CpG sites, 9995 showed DM (2.6%). Of those DM sites, 91.9% showed hypermethylation (9186 of 9995). Over 900 genes had DM associated with promoters. Promoter-based DM analysis revealed that genes in canonical cancer-related pathways such as Rac, Ras, PI3K/Akt, NFκB and ErBB4 were enriched, and represented biological functional alterations that involve cell cycle, apoptosis, cancer signaling and inflammation. A group of genes previously found to be up-regulated in pre-eclampsia, including GRB2, ATF3, NFKB2, as well as genes in proteasome subunits (PSMA1, PMSE1, PSMD1 and PMSD8), harbored hypomethylated promoters. Contrarily, a cluster of microRNAs, including mir-519a1, mir-301a, mir-487a, mir-185, mir-329, mir-194, mir-376a1, mir-486 and mir-744 were all hypermethylated in their promoters in the EOPE samples. These findings collectively reveal new avenues of research regarding the vast epigenetic modifications in EOPE.


Asunto(s)
Amnios/metabolismo , Corion/metabolismo , Metilación de ADN , Epigénesis Genética , Preeclampsia/metabolismo , Regiones Promotoras Genéticas , Adulto , Estudios de Casos y Controles , Biología Computacional , ADN/metabolismo , Regulación hacia Abajo , Femenino , Estudio de Asociación del Genoma Completo , Humanos , MicroARNs/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , Embarazo , Estudios Retrospectivos , Regulación hacia Arriba , Adulto Joven
20.
bioRxiv ; 2024 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-38617220

RESUMEN

Single-cell RNA sequencing (scRNA-Seq) data from complex human tissues have prevalent blood cell contamination due to the sample preparation process and may comprise cells of different genetic makeups. To reveal such complexity and annotate cells appropriately, we propose the first-of-its-kind computational framework, Originator, which deciphers single cells by genetic origin and separates blood cells from tissue-resident cells. We show that blood contamination is widely spread in scRNA-Seq data from a variety of tissues. We warn of the significant biases in downstream analysis without considering blood contamination and genetic contexts using pancreatic ductal adenocarcinoma and placenta data, respectively.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA