Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 76
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Nat Methods ; 21(3): 391-400, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38374264

RESUMO

Deciphering cell-type heterogeneity is crucial for systematically understanding tissue homeostasis and its dysregulation in diseases. Computational deconvolution is an efficient approach for estimating cell-type abundances from a variety of omics data. Despite substantial methodological progress in computational deconvolution in recent years, challenges are still outstanding. Here we enlist four important challenges related to computational deconvolution: the quality of the reference data, generation of ground truth data, limitations of computational methodologies, and benchmarking design and implementation. Finally, we make recommendations on reference data generation, new directions of computational methodologies, and strategies to promote rigorous benchmarking.


Assuntos
Biologia Computacional , Genômica , Biologia Computacional/métodos , Benchmarking
2.
Cell ; 151(1): 138-52, 2012 Sep 28.
Artigo em Inglês | MEDLINE | ID: mdl-23021221

RESUMO

Inflammation and macrophage foam cells are characteristic features of atherosclerotic lesions, but the mechanisms linking cholesterol accumulation to inflammation and LXR-dependent response pathways are poorly understood. To investigate this relationship, we utilized lipidomic and transcriptomic methods to evaluate the effect of diet and LDL receptor genotype on macrophage foam cell formation within the peritoneal cavities of mice. Foam cell formation was associated with significant changes in hundreds of lipid species and unexpected suppression, rather than activation, of inflammatory gene expression. We provide evidence that regulated accumulation of desmosterol underlies many of the homeostatic responses, including activation of LXR target genes, inhibition of SREBP target genes, selective reprogramming of fatty acid metabolism, and suppression of inflammatory-response genes, observed in macrophage foam cells. These observations suggest that macrophage activation in atherosclerotic lesions results from extrinsic, proinflammatory signals generated within the artery wall that suppress homeostatic and anti-inflammatory functions of desmosterol.


Assuntos
Aterosclerose/imunologia , Colesterol/biossíntese , Desmosterol/metabolismo , Células Espumosas/metabolismo , Metabolismo dos Lipídeos , Transcriptoma , Animais , Aterosclerose/metabolismo , Colesterol/análogos & derivados , Colesterol/metabolismo , Ácidos Graxos/metabolismo , Células Espumosas/imunologia , Técnicas de Silenciamento de Genes , Leucócitos Mononucleares/metabolismo , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Receptores de LDL/genética , Receptores de LDL/metabolismo , Proteínas de Ligação a Elemento Regulador de Esterol/metabolismo
3.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34974623

RESUMO

Motif discovery and characterization are important for gene regulation analysis. The lack of intuitive and integrative web servers impedes the effective use of motifs. Most motif discovery web tools are either not designed for non-expert users or lacking optimization steps when using default settings. Here we describe bipartite motifs learning (BML), a parameter-free web server that provides a user-friendly portal for online discovery and analysis of sequence motifs, using high-throughput sequencing data as the input. BML utilizes both position weight matrix and dinucleotide weight matrix, the latter of which enables the expression of the interdependencies of neighboring bases. With input parameters concerning the motifs are given, the BML achieves significantly higher accuracy than other available tools for motif finding. When no parameters are given by non-expert users, unlike other tools, BML employs a learning method to identify motifs automatically and achieve accuracy comparable to the scenario where the parameters are set. The BML web server is freely available at http://motif.t-ridership.com/ (https://github.com/Mohammad-Vahed/BML).


Assuntos
Motivos de Nucleotídeos , Software , Fatores de Transcrição/metabolismo , Navegador , Algoritmos , Arabidopsis , Sítios de Ligação , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Matrizes de Pontuação de Posição Específica , Análise de Sequência de DNA
4.
J Med Internet Res ; 26: e48997, 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39141914

RESUMO

BACKGROUND:  Preeclampsia is a potentially fatal complication during pregnancy, characterized by high blood pressure and the presence of excessive proteins in the urine. Due to its complexity, the prediction of preeclampsia onset is often difficult and inaccurate. OBJECTIVE:  This study aimed to create quantitative models to predict the onset gestational age of preeclampsia using electronic health records. METHODS:  We retrospectively collected 1178 preeclamptic pregnancy records from the University of Michigan Health System as the discovery cohort, and 881 records from the University of Florida Health System as the validation cohort. We constructed 2 Cox-proportional hazards models: 1 baseline model using maternal and pregnancy characteristics, and the other full model with additional laboratory findings, vitals, and medications. We built the models using 80% of the discovery data, tested the remaining 20% of the discovery data, and validated with the University of Florida data. We further stratified the patients into high- and low-risk groups for preeclampsia onset risk assessment. RESULTS:  The baseline model reached Concordance indices of 0.64 and 0.61 in the 20% testing data and the validation data, respectively, while the full model increased these Concordance indices to 0.69 and 0.61, respectively. For preeclampsia diagnosed at 34 weeks, the baseline and full models had area under the curve (AUC) values of 0.65 and 0.70, and AUC values of 0.69 and 0.70 for preeclampsia diagnosed at 37 weeks, respectively. Both models contain 5 selective features, among which the number of fetuses in the pregnancy, hypertension, and parity are shared between the 2 models with similar hazard ratios and significant P values. In the full model, maximum diastolic blood pressure in early pregnancy was the predominant feature. CONCLUSIONS:  Electronic health records data provide useful information to predict the gestational age of preeclampsia onset. Stratification of the cohorts using 5-predictor Cox-proportional hazards models provides clinicians with convenient tools to assess the onset time of preeclampsia in patients.


Assuntos
Registros Eletrônicos de Saúde , Pré-Eclâmpsia , Humanos , Feminino , Gravidez , Registros Eletrônicos de Saúde/estatística & dados numéricos , Adulto , Estudos Retrospectivos , Modelos de Riscos Proporcionais , Idade Gestacional
5.
BJOG ; 2023 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-37984426

RESUMO

OBJECTIVES: To identify and internally validate metabolites predictive of spontaneous preterm birth (sPTB) using multiple machine learning methods and sequential maternal serum samples, and to predict spontaneous early term birth (sETB) using these metabolites. DESIGN: Case-cohort design within a prospective cohort study. SETTING: Cambridge, UK. POPULATION OR SAMPLE: A total of 399 Pregnancy Outcome Prediction study participants, including 98 cases of sPTB. METHODS: An untargeted metabolomic analysis of maternal serum samples at 12, 20, 28 and 36 weeks of gestation was performed. We applied six supervised machine learning methods and a weighted Cox model to measurements at 28 weeks of gestation and sPTB, followed by feature selection. We used logistic regression with elastic net penalty, followed by best subset selection, to reduce the number of predictive metabolites further. We applied coefficients from the chosen models to measurements from different gestational ages to predict sPTB and sETB. MAIN OUTCOME MEASURES: sPTB and sETB. RESULTS: We identified 47 metabolites, mostly lipids, as important predictors of sPTB by two or more methods and 22 were identified by three or more methods. The best 4-predictor model had an optimism-corrected area under the receiver operating characteristics curve (AUC) of 0.703 at 28 weeks of gestation. The model also predicted sPTB in 12-week samples (0.606, 95% CI 0.544-0.667) and 20-week samples (0.657, 95% CI 0.597-0.717) and it predicted sETB in 36-week samples (0.727, 95% CI 0.606-0.849). A lysolipid, 1-palmitoleoyl-GPE (16:1)*, was the strongest predictor of sPTB at 12 weeks of gestation (0.609, 95% CI 0.548-0.670), 20 weeks (0.630, 95% CI 0.569-0.690) and 28 weeks (0.660, 95% CI 0.599-0.722), and of sETB at 36 weeks (0.739, 95% CI 0.618-0.860). CONCLUSIONS: We identified and internally validated maternal serum metabolites predictive of sPTB. A lysolipid, 1-palmitoleoyl-GPE (16:1)*, is a novel predictor of sPTB and sETB. Further validation in external populations is required.

6.
Bioinformatics ; 37(17): 2772-2774, 2021 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-33515235

RESUMO

SUMMARY: Cox-nnet is a neural-network-based prognosis prediction method, originally applied to genomics data. Here, we propose the version 2 of Cox-nnet, with significant improvement on efficiency and interpretability, making it suitable to predict prognosis based on large-scale population data, including those electronic medical records (EMR) datasets. We also add permutation-based feature importance scores and the direction of feature coefficients. When applied on a kidney transplantation dataset, Cox-nnet v2.0 reduces the training time of Cox-nnet up to 32-folds (n =10 000) and achieves better prediction accuracy than Cox-PH (P<0.05). It also achieves similarly superior performance on a publicly available SUPPORT data (n=8000). The high efficiency and accuracy make Cox-nnet v2.0 a desirable method for survival prediction in large-scale EMR data. AVAILABILITY AND IMPLEMENTATION: Cox-nnet v2.0 is freely available to the public at https://github.com/lanagarmire/Cox-nnet-v2.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

7.
FASEB J ; 35(4): e21524, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33742690

RESUMO

Maternal pre-pregnancy obesity may have an impact on both maternal and fetal health. We examined the microbiome recovered from placentas in a multi-ethnic maternal pre-pregnant obesity cohort, through an optimized microbiome protocol to enrich low bacterial biomass samples. We found that the microbiomes recovered from the placentas of obese pre-pregnant mothers are less abundant and less diverse when compared to those from mothers of normal pre-pregnancy weight. Microbiome richness also decreases from the maternal side to the fetal side, demonstrating heterogeneity by geolocation within the placenta. In summary, our study shows that the microbiomes recovered from the placentas are associated with pre-pregnancy obesity. IMPORTANCE: Maternal pre-pregnancy obesity may have an impact on both maternal and fetal health. The placenta is an important organ at the interface of the mother and fetus, and supplies nutrients to the fetus. We report that the microbiomes enriched from the placentas of obese pre-pregnant mothers are less abundant and less diverse when compared to those from mothers of normal pre-pregnancy weight. More over, the microbiomes also vary by geolocation within the placenta.


Assuntos
Microbiota/fisiologia , Obesidade Materna/metabolismo , Obesidade/complicações , Placenta/metabolismo , Adulto , Estudos de Coortes , Feminino , Desenvolvimento Fetal/fisiologia , Humanos , Gravidez , Complicações na Gravidez/etiologia
8.
J Lipid Res ; 62: 100118, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34547287

RESUMO

Preeclampsia is a pregnancy-specific syndrome characterized by hypertension and proteinuria after 20 weeks of gestation. However, it is not well understood what lipids are involved in the development of this condition, and even less is known how these lipids mediate its formation. To reveal the relationship between lipids and preeclampsia, we conducted lipidomic profiling of maternal sera of 44 severe preeclamptic and 20 healthy pregnant women from a multiethnic cohort in Hawaii. Correlation network analysis showed that oxidized phospholipids have increased intercorrelations and connections in preeclampsia, whereas other lipids, including triacylglycerols, have reduced network correlations and connections. A total of 10 lipid species demonstrate significant changes uniquely associated with preeclampsia but not any other clinical confounders. These species are from the lipid classes of lysophosphatidylcholines, phosphatidylcholines (PCs), cholesteryl esters, phosphatidylethanolamines, lysophosphatidylethanolamines, and ceramides. A random forest classifier built on these lipids shows highly accurate and specific prediction (F1 statistic = 0.94; balanced accuracy = 0.88) of severe preeclampsia, demonstrating their potential as biomarkers for this condition. These lipid species are enriched in dysregulated biological pathways, including insulin signaling, immune response, and phospholipid metabolism. Moreover, causality inference shows that various PCs and lysophosphatidylcholines mediate severe preeclampsia through PC 35:1e. Our results suggest that the lipidome may play a role in the pathogenesis and serve as biomarkers of severe preeclampsia.


Assuntos
Lipidômica , Lipídeos/sangue , Pré-Eclâmpsia/sangue , Adulto , Estudos de Coortes , Feminino , Humanos , Gravidez , Índice de Gravidade de Doença
9.
Trends Genet ; 34(10): 790-805, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30143323

RESUMO

Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.


Assuntos
Interpretação Estatística de Dados , Genômica/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Algoritmos , Humanos , Biologia de Sistemas/estatística & dados numéricos
10.
Nature ; 523(7559): 221-5, 2015 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-25924064

RESUMO

Inflammation is a beneficial host response to infection but can contribute to inflammatory disease if unregulated. The Th17 lineage of T helper (Th) cells can cause severe human inflammatory diseases. These cells exhibit both instability (they can cease to express their signature cytokine, IL-17A) and plasticity (they can start expressing cytokines typical of other lineages) upon in vitro re-stimulation. However, technical limitations have prevented the transcriptional profiling of pre- and post-conversion Th17 cells ex vivo during immune responses. Thus, it is unknown whether Th17 cell plasticity merely reflects change in expression of a few cytokines, or if Th17 cells physiologically undergo global genetic reprogramming driving their conversion from one T helper cell type to another, a process known as transdifferentiation. Furthermore, although Th17 cell instability/plasticity has been associated with pathogenicity, it is unknown whether this could present a therapeutic opportunity, whereby formerly pathogenic Th17 cells could adopt an anti-inflammatory fate. Here we used two new fate-mapping mouse models to track Th17 cells during immune responses to show that CD4(+) T cells that formerly expressed IL-17A go on to acquire an anti-inflammatory phenotype. The transdifferentiation of Th17 into regulatory T cells was illustrated by a change in their signature transcriptional profile and the acquisition of potent regulatory capacity. Comparisons of the transcriptional profiles of pre- and post-conversion Th17 cells also revealed a role for canonical TGF-ß signalling and consequently for the aryl hydrocarbon receptor (AhR) in conversion. Thus, Th17 cells transdifferentiate into regulatory cells, and contribute to the resolution of inflammation. Our data suggest that Th17 cell instability and plasticity is a therapeutic opportunity for inflammatory diseases.


Assuntos
Transdiferenciação Celular , Linfócitos T Reguladores/citologia , Linfócitos T Reguladores/imunologia , Células Th17/citologia , Células Th17/imunologia , Animais , Feminino , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Helmintíase/imunologia , Masculino , Camundongos , Nippostrongylus/imunologia , Infecções Estafilocócicas/imunologia , Staphylococcus aureus/imunologia
11.
J Proteome Res ; 19(4): 1361-1374, 2020 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-31975597

RESUMO

Maternal obesity has become a growing global health concern that may predispose the offspring to medical conditions later in life. However, the metabolic link between maternal prepregnant obesity and healthy offspring has not yet been fully elucidated. In this study, we conducted a case-control study using a coupled untargeted and targeted metabolomic approach from the newborn cord blood metabolomes associated with a matched maternal prepregnant obesity cohort of 28 cases and 29 controls. The subjects were recruited from multiethnic populations in Hawaii, including rarely reported Native Hawaiian and other Pacific Islanders (NHPI). We found that maternal obesity was the most important factor contributing to differences in cord blood metabolomics. Using an elastic net regularization-based logistic regression model, we identified 29 metabolites as potential early-life biomarkers manifesting intrauterine effect of maternal obesity, with accuracy as high as 0.947 after adjusting for clinical confounding (maternal and paternal age, ethnicity, parity, and gravidity). We validated the model results in a subsequent set of samples (N = 30) with an accuracy of 0.822. Among the metabolites, six metabolites (galactonic acid, butenylcarnitine, 2-hydroxy-3-methylbutyric acid, phosphatidylcholine diacyl C40:3, 1,5-anhydrosorbitol, and phosphatidylcholine acyl-alkyl 40:3) were individually and significantly different between the maternal obese and normal-weight groups. Interestingly, hydroxy-3-methylbutyric acid showed significantly higher levels in cord blood from the NHPI group compared to that from Asian and Caucasian groups. In summary, significant associations were observed between maternal prepregnant obesity and offspring metabolomic alternation at birth, revealing the intergenerational impact of maternal obesity.


Assuntos
Sangue Fetal , Mães , Peso ao Nascer , Índice de Massa Corporal , Estudos de Casos e Controles , Feminino , Humanos , Recém-Nascido , Metabolômica , Obesidade , Gravidez
12.
Reproduction ; 160(6): R155-R167, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33112783

RESUMO

Human placenta is a complex and heterogeneous organ interfacing between the mother and the fetus that supports fetal development. Alterations to placental structural components are associated with various pregnancy complications. To reveal the heterogeneity among various placenta cell types in normal and diseased placentas, as well as elucidate molecular interactions within a population of placental cells, a new genomics technology called single cell RNA-seq (or scRNA-seq) has been employed in the last couple of years. Here we review the principles of scRNA-seq technology, and summarize the recent human placenta studies at scRNA-seq level across gestational ages as well as in pregnancy complications, such as preterm birth and preeclampsia. We list the computational analysis platforms and resources available for the public use. Lastly, we discuss the future areas of interest for placenta single cell studies, as well as the data analytics needed to accomplish them.


Assuntos
Desenvolvimento Fetal , Perfilação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Placenta/metabolismo , Complicações na Gravidez/genética , Análise de Célula Única/métodos , Feminino , Idade Gestacional , Humanos , Placenta/citologia , Gravidez , Complicações na Gravidez/patologia
13.
Genes Dev ; 26(23): 2567-79, 2012 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-23152446

RESUMO

Tight control over the segregation of endoderm, mesoderm, and ectoderm is essential for normal embryonic development of all species, yet how neighboring embryonic blastomeres can contribute to different germ layers has never been fully explained. We postulated that microRNAs, which fine-tune many biological processes, might modulate the response of embryonic blastomeres to growth factors and other signals that govern germ layer fate. A systematic screen of a whole-genome microRNA library revealed that the let-7 and miR-18 families increase mesoderm at the expense of endoderm in mouse embryonic stem cells. Both families are expressed in ectoderm and mesoderm, but not endoderm, as these tissues become distinct during mouse and frog embryogenesis. Blocking let-7 function in vivo dramatically affected cell fate, diverting presumptive mesoderm and ectoderm into endoderm. siRNA knockdown of computationally predicted targets followed by mutational analyses revealed that let-7 and miR-18 down-regulate Acvr1b and Smad2, respectively, to attenuate Nodal responsiveness and bias blastomeres to ectoderm and mesoderm fates. These findings suggest a crucial role for the let-7 and miR-18 families in germ layer specification and reveal a remarkable conservation of function from amphibians to mammals.


Assuntos
Desenvolvimento Embrionário/genética , Regulação da Expressão Gênica no Desenvolvimento , Genoma/genética , Camadas Germinativas/embriologia , MicroRNAs/metabolismo , Animais , Células Cultivadas , Análise Mutacional de DNA , Células-Tronco Embrionárias , Técnicas de Silenciamento de Genes , Camundongos , MicroRNAs/genética , Xenopus laevis
14.
PLoS Comput Biol ; 14(4): e1006076, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29634719

RESUMO

Artificial neural networks (ANN) are computing architectures with many interconnections of simple neural-inspired computing elements, and have been applied to biomedical fields such as imaging analysis and diagnosis. We have developed a new ANN framework called Cox-nnet to predict patient prognosis from high throughput transcriptomics data. In 10 TCGA RNA-Seq data sets, Cox-nnet achieves the same or better predictive accuracy compared to other methods, including Cox-proportional hazards regression (with LASSO, ridge, and mimimax concave penalty), Random Forests Survival and CoxBoost. Cox-nnet also reveals richer biological information, at both the pathway and gene levels. The outputs from the hidden layer node provide an alternative approach for survival-sensitive dimension reduction. In summary, we have developed a new method for accurate and efficient prognosis prediction on high throughput data, with functional biological insights. The source code is freely available at https://github.com/lanagarmire/cox-nnet.


Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Redes Neurais de Computação , Prognóstico , Modelos de Riscos Proporcionais , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Feminino , Redes Reguladoras de Genes , Humanos , Estimativa de Kaplan-Meier , Masculino , Redes e Vias Metabólicas/genética , Neoplasias/genética , Neoplasias/metabolismo , Neoplasias/mortalidade , Análise de Sequência de RNA/estatística & dados numéricos , Análise de Sobrevida
15.
J Proteome Res ; 17(1): 337-347, 2018 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-29110491

RESUMO

Metabolomics holds the promise as a new technology to diagnose highly heterogeneous diseases. Conventionally, metabolomics data analysis for diagnosis is done using various statistical and machine learning based classification methods. However, it remains unknown if deep neural network, a class of increasingly popular machine learning methods, is suitable to classify metabolomics data. Here we use a cohort of 271 breast cancer tissues, 204 positive estrogen receptor (ER+), and 67 negative estrogen receptor (ER-) to test the accuracies of feed-forward networks, a deep learning (DL) framework, as well as six widely used machine learning models, namely random forest (RF), support vector machines (SVM), recursive partitioning and regression trees (RPART), linear discriminant analysis (LDA), prediction analysis for microarrays (PAM), and generalized boosted models (GBM). DL framework has the highest area under the curve (AUC) of 0.93 in classifying ER+/ER- patients, compared to the other six machine learning algorithms. Furthermore, the biological interpretation of the first hidden layer reveals eight commonly enriched significant metabolomics pathways (adjusted P-value <0.05) that cannot be discovered by other machine learning methods. Among them, protein digestion and absorption and ATP-binding cassette (ABC) transporters pathways are also confirmed in integrated analysis between metabolomics and gene expression data in these samples. In summary, deep learning method shows advantages for metabolomics based breast cancer ER status classification, with both the highest prediction accuracy (AUC = 0.93) and better revelation of disease biology. We encourage the adoption of feed-forward networks based deep learning method in the metabolomics research community for classification.


Assuntos
Neoplasias da Mama/classificação , Aprendizado de Máquina/normas , Metabolômica/métodos , Receptores de Estrogênio/análise , Área Sob a Curva , Feminino , Humanos
17.
RNA ; 20(11): 1684-96, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-25246651

RESUMO

It is crucial for researchers to optimize RNA-seq experimental designs for differential expression detection. Currently, the field lacks general methods to estimate power and sample size for RNA-Seq in complex experimental designs, under the assumption of the negative binomial distribution. We simulate RNA-Seq count data based on parameters estimated from six widely different public data sets (including cell line comparison, tissue comparison, and cancer data sets) and calculate the statistical power in paired and unpaired sample experiments. We comprehensively compare five differential expression analysis packages (DESeq, edgeR, DESeq2, sSeq, and EBSeq) and evaluate their performance by power, receiver operator characteristic (ROC) curves, and other metrics including areas under the curve (AUC), Matthews correlation coefficient (MCC), and F-measures. DESeq2 and edgeR tend to give the best performance in general. Increasing sample size or sequencing depth increases power; however, increasing sample size is more potent than sequencing depth to increase power, especially when the sequencing depth reaches 20 million reads. Long intergenic noncoding RNAs (lincRNA) yields lower power relative to the protein coding mRNAs, given their lower expression level in the same RNA-Seq experiment. On the other hand, paired-sample RNA-Seq significantly enhances the statistical power, confirming the importance of considering the multifactor experimental design. Finally, a local optimal power is achievable for a given budget constraint, and the dominant contributing factor is sample size rather than the sequencing depth. In conclusion, we provide a power analysis tool (http://www2.hawaii.edu/~lgarmire/RNASeqPowerCalculator.htm) that captures the dispersion in the data and can serve as a practical reference under the budget constraint of RNA-Seq experiments.


Assuntos
Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA Mensageiro/química , Análise de Sequência de RNA/métodos , Animais , Simulação por Computador , Bases de Dados Genéticas , Humanos , Camundongos , Análise de Regressão , Reprodutibilidade dos Testes , Projetos de Pesquisa , Tamanho da Amostra , Software
18.
BMC Bioinformatics ; 16 Suppl 5: S10, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25861082

RESUMO

BACKGROUND: Epigenetic alterations are known to correlate with changes in gene expression among various diseases including cancers. However, quantitative models that accurately predict the up or down regulation of gene expression are currently lacking. METHODS: A new machine learning-based method of gene expression prediction is developed in the context of lung cancer. This method uses the Illumina Infinium HumanMethylation450K Beadchip CpG methylation array data from paired lung cancer and adjacent normal tissues in The Cancer Genome Atlas (TCGA) and histone modification marker CHIP-Seq data from the ENCODE project, to predict the differential expression of RNA-Seq data in TCGA lung cancers. It considers a comprehensive list of 1424 features spanning the four categories of CpG methylation, histone H3 methylation modification, nucleotide composition, and conservation. Various feature selection and classification methods are compared to select the best model over 10-fold cross-validation in the training data set. RESULTS: A best model comprising 67 features is chosen by ReliefF based feature selection and random forest classification method, with AUC = 0.864 from the 10-fold cross-validation of the training set and AUC = 0.836 from the testing set. The selected features cover all four data types, with histone H3 methylation modification (32 features) and CpG methylation (15 features) being most abundant. Among the dropping-off tests of individual data-type based features, removal of CpG methylation feature leads to the most reduction in model performance. In the best model, 19 selected features are from the promoter regions (TSS200 and TSS1500), highest among all locations relative to transcripts. Sequential dropping-off of CpG methylation features relative to different regions on the protein coding transcripts shows that promoter regions contribute most significantly to the accurate prediction of gene expression. CONCLUSIONS: By considering a comprehensive list of epigenomic and genomic features, we have constructed an accurate model to predict transcriptomic differential expression, exemplified in lung cancer.


Assuntos
Ilhas de CpG/genética , Metilação de DNA , Epigenômica/métodos , Regulação Neoplásica da Expressão Gênica , Genoma Humano , Neoplasias Pulmonares/genética , Inteligência Artificial , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Histonas/genética , Humanos
19.
PLoS Comput Biol ; 10(9): e1003851, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25233347

RESUMO

Breast cancer is the most common malignancy in women worldwide. With the increasing awareness of heterogeneity in breast cancers, better prediction of breast cancer prognosis is much needed for more personalized treatment and disease management. Towards this goal, we have developed a novel computational model for breast cancer prognosis by combining the Pathway Deregulation Score (PDS) based pathifier algorithm, Cox regression and L1-LASSO penalization method. We trained the model on a set of 236 patients with gene expression data and clinical information, and validated the performance on three diversified testing data sets of 606 patients. To evaluate the performance of the model, we conducted survival analysis of the dichotomized groups, and compared the areas under the curve based on the binary classification. The resulting prognosis genomic model is composed of fifteen pathways (e.g., P53 pathway) that had previously reported cancer relevance, and it successfully differentiated relapse in the training set (log rank p-value = 6.25e-12) and three testing data sets (log rank p-value < 0.0005). Moreover, the pathway-based genomic models consistently performed better than gene-based models on all four data sets. We also find strong evidence that combining genomic information with clinical information improved the p-values of prognosis prediction by at least three orders of magnitude in comparison to using either genomic or clinical information alone. In summary, we propose a novel prognosis model that harnesses the pathway-based dysregulation as well as valuable clinical information. The selected pathways in our prognosis model are promising targets for therapeutic intervention.


Assuntos
Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Genômica/métodos , Modelos Estatísticos , Algoritmos , Neoplasias da Mama/metabolismo , Progressão da Doença , Feminino , Perfilação da Expressão Gênica , Humanos , Pessoa de Meia-Idade , Prognóstico , Modelos de Riscos Proporcionais , Curva ROC , Transcriptoma
20.
Mol Hum Reprod ; 20(9): 885-904, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24944161

RESUMO

Pre-eclampsia is the leading cause of fetal and maternal morbidity and mortality. Early onset pre-eclampsia (EOPE) is a disorder that has severe maternal and fetal outcomes, whilst its etiology is poorly understood. We hypothesize that epigenetics plays an important role to mediate the development of EOPE and conducted a case-control study to compare the genome-wide methylome difference between chorioamniotic membranes from 30 EOPE and 17 full-term pregnancies using the Infinium Human Methylation 450 BeadChip arrays. Bioinformatics analysis tested differential methylation (DM) at CpG site level, gene level, and pathway and network level. A striking genome-wide hypermethylation pattern coupled with hypomethylation in promoters was observed. Out of 385 184 CpG sites, 9995 showed DM (2.6%). Of those DM sites, 91.9% showed hypermethylation (9186 of 9995). Over 900 genes had DM associated with promoters. Promoter-based DM analysis revealed that genes in canonical cancer-related pathways such as Rac, Ras, PI3K/Akt, NFκB and ErBB4 were enriched, and represented biological functional alterations that involve cell cycle, apoptosis, cancer signaling and inflammation. A group of genes previously found to be up-regulated in pre-eclampsia, including GRB2, ATF3, NFKB2, as well as genes in proteasome subunits (PSMA1, PMSE1, PSMD1 and PMSD8), harbored hypomethylated promoters. Contrarily, a cluster of microRNAs, including mir-519a1, mir-301a, mir-487a, mir-185, mir-329, mir-194, mir-376a1, mir-486 and mir-744 were all hypermethylated in their promoters in the EOPE samples. These findings collectively reveal new avenues of research regarding the vast epigenetic modifications in EOPE.


Assuntos
Âmnio/metabolismo , Córion/metabolismo , Metilação de DNA , Epigênese Genética , Pré-Eclâmpsia/metabolismo , Regiões Promotoras Genéticas , Adulto , Estudos de Casos e Controles , Biologia Computacional , DNA/metabolismo , Regulação para Baixo , Feminino , Estudo de Associação Genômica Ampla , Humanos , MicroRNAs/metabolismo , Análise de Sequência com Séries de Oligonucleotídeos , Gravidez , Estudos Retrospectivos , Regulação para Cima , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA