Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Alzheimers Dement ; 20(10): 7174-7192, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39215503

RESUMEN

INTRODUCTION: Multi-omics studies in Alzheimer's disease (AD) revealed many potential disease pathways and therapeutic targets. Despite their promise of precision medicine, these studies lacked Black Americans (BA) and Latin Americans (LA), who are disproportionately affected by AD. METHODS: To bridge this gap, Accelerating Medicines Partnership in Alzheimer's Disease (AMP-AD) expanded brain multi-omics profiling to multi-ethnic donors. RESULTS: We generated multi-omics data and curated and harmonized phenotypic data from BA (n = 306), LA (n = 326), or BA and LA (n = 4) brain donors plus non-Hispanic White (n = 252) and other (n = 20) ethnic groups, to establish a foundational dataset enriched for BA and LA participants. This study describes the data available to the research community, including transcriptome from three brain regions, whole genome sequence, and proteome measures. DISCUSSION: The inclusion of traditionally underrepresented groups in multi-omics studies is essential to discovering the full spectrum of precision medicine targets that will be pertinent to all populations affected with AD. HIGHLIGHTS: Accelerating Medicines Partnership in Alzheimer's Disease Diversity Initiative led brain tissue profiling in multi-ethnic populations. Brain multi-omics data is generated from Black American, Latin American, and non-Hispanic White donors. RNA, whole genome sequencing and tandem mass tag proteomicsis completed and shared. Multiple brain regions including caudate, temporal and dorsolateral prefrontal cortex were profiled.


Asunto(s)
Enfermedad de Alzheimer , Encéfalo , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Masculino , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/etnología , Negro o Afroamericano/genética , Encéfalo/metabolismo , Encéfalo/patología , Etnicidad/genética , Hispánicos o Latinos/genética , Multiómica , Transcriptoma , Blanco/genética
2.
Mol Plant Microbe Interact ; 36(12): 805-820, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37717250

RESUMEN

We report a public resource for examining the spatiotemporal RNA expression of 54,893 Medicago truncatula genes during the first 72 h of response to rhizobial inoculation. Using a methodology that allows synchronous inoculation and growth of more than 100 plants in a single media container, we harvested the same segment of each root responding to rhizobia in the initial inoculation over a time course, collected individual tissues from these segments with laser capture microdissection, and created and sequenced RNA libraries generated from these tissues. We demonstrate the utility of the resource by examining the expression patterns of a set of genes induced very early in nodule signaling, as well as two gene families (CLE peptides and nodule specific PLAT-domain proteins) and show that despite similar whole-root expression patterns, there are tissue differences in expression between the genes. Using a rhizobial response dataset generated from transcriptomics on intact root segments, we also examined differential temporal expression patterns and determined that, after nodule tissue, the epidermis and cortical cells contained the most temporally patterned genes. We circumscribed gene lists for each time and tissue examined and developed an expression pattern visualization tool. Finally, we explored transcriptomic differences between the inner cortical cells that become nodules and those that do not, confirming that the expression of 1-aminocyclopropane-1-carboxylate synthases distinguishes inner cortical cells that become nodules and provide and describe potential downstream genes involved in early nodule cell division. [Formula: see text] Copyright © 2023 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.


Asunto(s)
Medicago truncatula , Rhizobium , Nódulos de las Raíces de las Plantas/metabolismo , Transcriptoma/genética , Raíces de Plantas , Medicago truncatula/metabolismo , Captura por Microdisección con Láser , Rhizobium/genética , ARN/metabolismo , Simbiosis/genética , Regulación de la Expresión Génica de las Plantas , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Nodulación de la Raíz de la Planta/genética
3.
Curr Issues Mol Biol ; 45(6): 4612-4631, 2023 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-37367042

RESUMEN

Nodule number regulation in legumes is controlled by a feedback loop that integrates nutrient and rhizobia symbiont status signals to regulate nodule development. Signals from the roots are perceived by shoot receptors, including a CLV1-like receptor-like kinase known as SUNN in Medicago truncatula. In the absence of functional SUNN, the autoregulation feedback loop is disrupted, resulting in hypernodulation. To elucidate early autoregulation mechanisms disrupted in SUNN mutants, we searched for genes with altered expression in the loss-of-function sunn-4 mutant and included the rdn1-2 autoregulation mutant for comparison. We identified constitutively altered expression of small groups of genes in sunn-4 roots and in sunn-4 shoots. All genes with verified roles in nodulation that were induced in wild-type roots during the establishment of nodules were also induced in sunn-4, including autoregulation genes TML2 and TML1. Only an isoflavone-7-O-methyltransferase gene was induced in response to rhizobia in wild-type roots but not induced in sunn-4. In shoot tissues of wild-type, eight rhizobia-responsive genes were identified, including a MYB family transcription factor gene that remained at a baseline level in sunn-4; three genes were induced by rhizobia in shoots of sunn-4 but not wild-type. We cataloged the temporal induction profiles of many small secreted peptide (MtSSP) genes in nodulating root tissues, encompassing members of twenty-four peptide families, including the CLE and IRON MAN families. The discovery that expression of TML2 in roots, a key factor in inhibiting nodulation in response to autoregulation signals, is also triggered in sunn-4 in the section of roots analyzed, suggests that the mechanism of TML regulation of nodulation in M. truncatula may be more complex than published models.

4.
Am J Bot ; 106(11): 1466-1476, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31709515

RESUMEN

PREMISE: Plants synthesize information from multiple environmental stimuli when determining their direction of growth. Gravity, being ubiquitous on Earth, plays a major role in determining the direction of growth and overall architecture of the plant. Here, we utilized the microgravity environment on board the International Space Station (ISS) to identify genes involved influencing growth and development of phototropically stimulated seedlings of Arabidopsis thaliana. METHODS: Seedlings were grown on the ISS, and RNA was extracted from 7 samples (pools of 10-15 plants) grown in microgravity (µg) or Earth gravity conditions (1-g). Transcriptomic analyses via RNA sequencing (RNA-seq) of differential gene expression was performed using the HISAT2-Stringtie-DESeq2 RNASeq pipeline. Differentially expressed genes were further characterized by using Pathway Analysis and enrichment for Gene Ontology classifications. RESULTS: For 296 genes that were found significantly differentially expressed between plants in microgravity compared to 1-g controls, Pathway Analysis identified eight molecular pathways that were significantly affected by reduced gravity conditions. Specifically, light-associated pathways (e.g., photosynthesis-antenna proteins, photosynthesis, porphyrin, and chlorophyll metabolism) were significantly downregulated in microgravity. CONCLUSIONS: Gene expression in A. thaliana seedlings grown in microgravity was significantly altered compared to that of the 1-g control. Understanding how plants grow in conditions of microgravity not only aids in our understanding of how plants grow and respond to the environment but will also help to efficiently grow plants during long-range space missions.


Asunto(s)
Proteínas de Arabidopsis , Arabidopsis , Vuelo Espacial , Ingravidez , Plantones
5.
bioRxiv ; 2024 Apr 20.
Artículo en Inglés | MEDLINE | ID: mdl-38659743

RESUMEN

INTRODUCTION: Multi-omics studies in Alzheimer's disease (AD) revealed many potential disease pathways and therapeutic targets. Despite their promise of precision medicine, these studies lacked African Americans (AA) and Latin Americans (LA), who are disproportionately affected by AD. METHODS: To bridge this gap, Accelerating Medicines Partnership in AD (AMP-AD) expanded brain multi-omics profiling to multi-ethnic donors. RESULTS: We generated multi-omics data and curated and harmonized phenotypic data from AA (n=306), LA (n=326), or AA and LA (n=4) brain donors plus Non-Hispanic White (n=252) and other (n=20) ethnic groups, to establish a foundational dataset enriched for AA and LA participants. This study describes the data available to the research community, including transcriptome from three brain regions, whole genome sequence, and proteome measures. DISCUSSION: Inclusion of traditionally underrepresented groups in multi-omics studies is essential to discover the full spectrum of precision medicine targets that will be pertinent to all populations affected with AD.

6.
Front Plant Sci ; 13: 861639, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35463395

RESUMEN

In response to colonization by rhizobia bacteria, legumes are able to form nitrogen-fixing nodules in their roots, allowing the plants to grow efficiently in nitrogen-depleted environments. Legumes utilize a complex, long-distance signaling pathway to regulate nodulation that involves signals in both roots and shoots. We measured the transcriptional response to treatment with rhizobia in both the shoots and roots of Medicago truncatula over a 72-h time course. To detect temporal shifts in gene expression, we developed GeneShift, a novel computational statistics and machine learning workflow that addresses the time series replicate the averaging issue for detecting gene expression pattern shifts under different conditions. We identified both known and novel genes that are regulated dynamically in both tissues during early nodulation including leginsulin, defensins, root transporters, nodulin-related, and circadian clock genes. We validated over 70% of the expression patterns that GeneShift discovered using an independent M. truncatula RNA-Seq study. GeneShift facilitated the discovery of condition-specific temporally differentially expressed genes in the symbiotic nodulation biological system. In principle, GeneShift should work for time-series gene expression profiling studies from other systems.

7.
Genome Med ; 13(1): 76, 2021 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-33947463

RESUMEN

BACKGROUND: Alzheimer's disease (AD) is an incurable neurodegenerative disease currently affecting 1.75% of the US population, with projected growth to 3.46% by 2050. Identifying common genetic variants driving differences in transcript expression that confer AD risk is necessary to elucidate AD mechanism and develop therapeutic interventions. We modify the FUSION transcriptome-wide association study (TWAS) pipeline to ingest gene expression values from multiple neocortical regions. METHODS: A combined dataset of 2003 genotypes clustered to 1000 Genomes individuals from Utah with Northern and Western European ancestry (CEU) was used to construct a training set of 790 genotypes paired to 888 RNASeq profiles from temporal cortex (TCX = 248), prefrontal cortex (FP = 50), inferior frontal gyrus (IFG = 41), superior temporal gyrus (STG = 34), parahippocampal cortex (PHG = 34), and dorsolateral prefrontal cortex (DLPFC = 461). Following within-tissue normalization and covariate adjustment, predictive weights to impute expression components based on a gene's surrounding cis-variants were trained. The FUSION pipeline was modified to support input of pre-scaled expression values and support cross validation with a repeated measure design arising from the presence of multiple transcriptome samples from the same individual across different tissues. RESULTS: Cis-variant architecture alone was informative to train weights and impute expression for 6780 (49.67%) autosomal genes, the majority of which significantly correlated with gene expression; FDR < 5%: N = 6775 (99.92%), Bonferroni: N = 6716 (99.06%). Validation of weights in 515 matched genotype to RNASeq profiles from the CommonMind Consortium (CMC) was (72.14%) in DLPFC profiles. Association of imputed expression components from all 2003 genotype profiles yielded 8 genes significantly associated with AD (FDR < 0.05): APOC1, EED, CD2AP, CEACAM19, CLPTM1, MTCH2, TREM2, and KNOP1. CONCLUSIONS: We provide evidence of cis-genetic variation conferring AD risk through 8 genes across six distinct genomic loci. Moreover, we provide expression weights for 6780 genes as a valuable resource to the community, which can be abstracted across the neocortex and a wide range of neuronal phenotypes.


Asunto(s)
Enfermedad de Alzheimer/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Neocórtex/metabolismo , Sitios de Carácter Cuantitativo , Transcriptoma , Biología Computacional/métodos , Regulación de la Expresión Génica , Estudio de Asociación del Genoma Completo/métodos , Humanos , Especificidad de Órganos/genética
8.
Sci Rep ; 10(1): 17089, 2020 10 13.
Artículo en Inglés | MEDLINE | ID: mdl-33051491

RESUMEN

The human brain is a complex organ that consists of several regions each with a unique gene expression pattern. Our intent in this study was to construct a gene co-expression network (GCN) for the normal brain using RNA expression profiles from the Genotype-Tissue Expression (GTEx) project. The brain GCN contains gene correlation relationships that are broadly present in the brain or specific to thirteen brain regions, which we later combined into six overarching brain mini-GCNs based on the brain's structure. Using the expression profiles of brain region-specific GCN edges, we determined how well the brain region samples could be discriminated from each other, visually with t-SNE plots or quantitatively with the Gene Oracle deep learning classifier. Next, we tested these gene sets on their relevance to human tumors of brain and non-brain origin. Interestingly, we found that genes in the six brain mini-GCNs showed markedly higher mutation rates in tumors relative to matched sets of random genes. Further, we found that cortex genes subdivided Head and Neck Squamous Cell Carcinoma (HNSC) tumors and Pheochromocytoma and Paraganglioma (PCPG) tumors into distinct groups. The brain GCN and mini-GCNs are useful resources for the classification of brain regions and identification of biomarker genes for brain related phenotypes.


Asunto(s)
Biomarcadores/metabolismo , Encéfalo/metabolismo , Redes Reguladoras de Genes , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/metabolismo , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Marcadores Genéticos , Humanos , Modelos Genéticos , Modelos Neurológicos , Mutación , Redes Neurales de la Computación , Distribución Tisular
9.
Curr Protoc Hum Genet ; 108(1): e105, 2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33085189

RESUMEN

The AD Knowledge Portal (adknowledgeportal.org) is a public data repository that shares data and other resources generated by multiple collaborative research programs focused on aging, dementia, and Alzheimer's disease (AD). In this article, we highlight how to use the Portal to discover and download genomic variant and transcriptomic data from the same individuals. First, we show how to use the web interface to browse and search for data of interest using relevant file annotations. We demonstrate how to learn more about the context surrounding the data, including diagnostic criteria and methodological details about sample preparation and data analysis. We present two primary ways to download data-using a web interface, and using a programmatic method that provides access using the command line. Finally, we show how to merge separate sources of metadata into a comprehensive file that contains factors and covariates necessary in downstream analyses. © 2020 The Authors. Basic Protocol 1: Find and download files associated with a selected study Basic Protocol 2: Download files in bulk using the command line client Basic Protocol 3: Working with file annotations and metadata.


Asunto(s)
Envejecimiento , Enfermedad de Alzheimer/terapia , Bases de Datos Genéticas/estadística & datos numéricos , Genómica/métodos , Almacenamiento y Recuperación de la Información/métodos , Programas Informáticos , Enfermedad de Alzheimer/diagnóstico , Genómica/estadística & datos numéricos , Humanos , Internet
10.
Front Genet ; 11: 317, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32477397

RESUMEN

From noble beginnings as a prospective forage, polyploid Sorghum halepense ('Johnsongrass') is both an invasive species and one of the world's worst agricultural weeds. Formed by S. bicolor x S. propinquum hybridization, we show S. halepense to have S. bicolor-enriched allele composition and striking mutations in 5,957 genes that differentiate it from representatives of its progenitor species and an outgroup. The spread of S. halepense may have been facilitated by introgression from closely-related cultivated sorghum near genetic loci affecting rhizome development, seed size, and levels of lutein, a photochemical protectant and abscisic acid precursor. Rhizomes, subterranean stems that store carbohydrates and spawn clonal propagules, have growth correlated with reproductive rather than other vegetative tissues, and increase survival of both temperate cold seasons and tropical dry seasons. Rhizomes of S. halepense are more extensive than those of its rhizomatous progenitor S. propinquum, with gene expression including many alleles from its non-rhizomatous S. bicolor progenitor. The first surviving polyploid in its lineage in ∼96 million years, its post-Columbian spread across six continents carried rich genetic diversity that in the United States has facilitated transition from agricultural to non-agricultural niches. Projected to spread another 200-600 km northward in the coming century, despite its drawbacks S. halepense may offer novel alleles and traits of value to improvement of sorghum.

11.
Sci Rep ; 9(1): 2899, 2019 02 27.
Artículo en Inglés | MEDLINE | ID: mdl-30814637

RESUMEN

Renal cell carcinoma (RCC) subtypes are characterized by distinct molecular profiles. Using RNA expression profiles from 1,009 RCC samples, we constructed a condition-annotated gene coexpression network (GCN). The RCC GCN contains binary gene coexpression relationships (edges) specific to conditions including RCC subtype and tumor stage. As an application of this resource, we discovered RCC GCN edges and modules that were associated with genetic lesions in known RCC driver genes, including VHL, a common initiating clear cell RCC (ccRCC) genetic lesion, and PBRM1 and BAP1 which are early genetic lesions in the Braided Cancer River Model (BCRM). Since ccRCC tumors with PBRM1 mutations respond to targeted therapy differently than tumors with BAP1 mutations, we focused on ccRCC-specific edges associated with tumors that exhibit alternate mutation profiles: VHL-PBRM1 or VHL-BAP1. We found specific blends molecular functions associated with these two mutation paths. Despite these mutation-associated edges having unique genes, they were enriched for the same immunological functions suggesting a convergent functional role for alternate gene sets consistent with the BCRM. The condition annotated RCC GCN described herein is a novel data mining resource for the assignment of polygenic biomarkers and their relationships to RCC tumors with specific molecular and mutational profiles.


Asunto(s)
Carcinoma de Células Renales/genética , Neoplasias Renales/genética , Mutación/genética , Carcinogénesis/genética , Carcinoma de Células Renales/patología , Proteínas de Unión al ADN/genética , Conjuntos de Datos como Asunto , Progresión de la Enfermedad , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Neoplasias Renales/patología , Estadificación de Neoplasias , Proteínas de Fusión Oncogénica/genética , Factores de Transcripción/genética , Transcriptoma , Proteínas Supresoras de Tumor/genética , Ubiquitina Tiolesterasa/genética , Proteína Supresora de Tumores del Síndrome de Von Hippel-Lindau/genética
12.
Bioinform Biol Insights ; 13: 1177932219856359, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31236009

RESUMEN

MOTIVATION: As the size of high-throughput DNA sequence datasets continues to grow, the cost of transferring and storing the datasets may prevent their processing in all but the largest data centers or commercial cloud providers. To lower this cost, it should be possible to process only a subset of the original data while still preserving the biological information of interest. RESULTS: Using 4 high-throughput DNA sequence datasets of differing sequencing depth from 2 species as use cases, we demonstrate the effect of processing partial datasets on the number of detected RNA transcripts using an RNA-Seq workflow. We used transcript detection to decide on a cutoff point. We then physically transferred the minimal partial dataset and compared with the transfer of the full dataset, which showed a reduction of approximately 25% in the total transfer time. These results suggest that as sequencing datasets get larger, one way to speed up analysis is to simply transfer the minimal amount of data that still sufficiently detects biological signal. AVAILABILITY: All results were generated using public datasets from NCBI and publicly available open source software.

13.
Front Plant Sci ; 10: 1409, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31737022

RESUMEN

Root nodulation results from a symbiotic relationship between a plant host and Rhizobium bacteria. Synchronized gene expression patterns over the course of rhizobial infection result in activation of pathways that are unique but overlapping with the highly conserved pathways that enable mycorrhizal symbiosis. We performed RNA sequencing of 30 Medicago truncatula root maturation zone samples at five distinct time points. These samples included plants inoculated with Sinorhizobium medicae and control plants that did not receive any Rhizobium. Following gene expression quantification, we identified 1,758 differentially expressed genes at various time points. We constructed a gene co-expression network (GCN) from the same data and identified link community modules (LCMs) that were comprised entirely of differentially expressed genes at specific time points post-inoculation. One LCM included genes that were up-regulated at 24 h following inoculation, suggesting an activation of allergen family genes and carbohydrate-binding gene products in response to Rhizobium. We also identified two LCMs that were comprised entirely of genes that were down regulated at 24 and 48 h post-inoculation. The identity of the genes in these modules suggest that down-regulating specific genes at 24 h may result in decreased jasmonic acid production with an increase in cytokinin production. At 48 h, coordinated down-regulation of a specific set of genes involved in lipid biosynthesis may play a role in nodulation. We show that GCN-LCM analysis is an effective method to preliminarily identify polygenic candidate biomarkers of root nodulation and develop hypotheses for future discovery.

14.
Front Plant Sci ; 10: 1529, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31850027

RESUMEN

Introduction: Traveling to nearby extraterrestrial objects having a reduced gravity level (partial gravity) compared to Earth's gravity is becoming a realistic objective for space agencies. The use of plants as part of life support systems will require a better understanding of the interactions among plant growth responses including tropisms, under partial gravity conditions. Materials and Methods: Here, we present results from our latest space experiments on the ISS, in which seeds of Arabidopsis thaliana were germinated, and seedlings grew for six days under different gravity levels, namely micro-g, several intermediate partial-g levels, and 1g, and were subjected to irradiation with blue light for the last 48 h. RNA was extracted from 20 samples for subsequent RNAseq analysis. Transcriptomic analysis was performed using the HISAT2-Stringtie-DESeq pipeline. Differentially expressed genes were further characterized for global responses using the GEDI tool, gene networks and for Gene Ontology (GO) enrichment. Results: Differential gene expression analysis revealed only one differentially expressed gene (AT4G21560, VPS28-1 a vacuolar protein) across all gravity conditions using FDR correction (q < 0.05). However, the same 14 genes appeared differentially expressed when comparing either micro-g, low-g level (< 0.1g) or the Moon g-level with 1g control conditions. Apart from these 14-shared genes, the number of differentially expressed genes was similar in microgravity and the Moon g-level and increased in the intermediate g-level (< 0.1g), but it was then progressively reduced as the difference with the Earth gravity became smaller. The GO groups were differentially affected at each g-level: light and photosynthesis GO under microgravity, genes belonged to general stress, chemical and hormone responses under low-g, and a response related to cell wall and membrane structure and function under the Moon g-level. Discussion: Transcriptional analyses of plants under blue light stimulation suggests that root blue-light phototropism may be enough to reduce the gravitational stress response caused by the lack of gravitropism in microgravity. Competition among tropisms induces an intense perturbation at the micro-g level, which shows an extensive stress response that is progressively attenuated. Our results show a major effect on cell wall/membrane remodeling (detected at the interval from the Moon to Mars gravity), which can be potentially related to graviresistance mechanisms.

15.
Oncotarget ; 9(13): 10995-11008, 2018 Feb 16.
Artículo en Inglés | MEDLINE | ID: mdl-29541392

RESUMEN

Tumors exhibit complex patterns of aberrant gene expression. Using a knowledge-independent, noise-reducing gene co-expression network construction software called KINC, we created multiple RNAseq-based gene co-expression networks relevant to brain and glioblastoma biology. In this report, we describe the discovery and validation of a glioblastoma-specific gene module that contains 22 co-expressed genes. The genes are upregulated in glioblastoma relative to normal brain and lower grade glioma samples; they are also hypo-methylated in glioblastoma relative to lower grade glioma tumors. Among the proneural, neural, mesenchymal, and classical glioblastoma subtypes, these genes are most-highly expressed in the mesenchymal subtype. Furthermore, high expression of these genes is associated with decreased survival across each glioblastoma subtype. These genes are of interest to glioblastoma biology and our gene interaction discovery and validation workflow can be used to discover and validate co-expressed gene modules derived from any co-expression network.

16.
Sci Rep ; 8(1): 8180, 2018 05 25.
Artículo en Inglés | MEDLINE | ID: mdl-29802335

RESUMEN

We applied two state-of-the-art, knowledge independent data-mining methods - Dynamic Quantum Clustering (DQC) and t-Distributed Stochastic Neighbor Embedding (t-SNE) - to data from The Cancer Genome Atlas (TCGA). We showed that the RNA expression patterns for a mixture of 2,016 samples from five tumor types can sort the tumors into groups enriched for relevant annotations including tumor type, gender, tumor stage, and ethnicity. DQC feature selection analysis discovered 48 core biomarker transcripts that clustered tumors by tumor type. When these transcripts were removed, the geometry of tumor relationships changed, but it was still possible to classify the tumors using the RNA expression profiles of the remaining transcripts. We continued to remove the top biomarkers for several iterations and performed cluster analysis. Even though the most informative transcripts were removed from the cluster analysis, the sorting ability of remaining transcripts remained strong after each iteration. Further, in some iterations we detected a repeating pattern of biological function that wasn't detectable with the core biomarker transcripts present. This suggests the existence of a "background classification" potential in which the pattern of gene expression after continued removal of "biomarker" transcripts could still classify tumors in agreement with the tumor type.


Asunto(s)
Biomarcadores de Tumor/genética , Biología Computacional , Neoplasias/clasificación , Neoplasias/genética , Análisis por Conglomerados , Femenino , Perfilación de la Expresión Génica , Humanos , Masculino , Estadificación de Neoplasias , Neoplasias/patología
17.
Sci Rep ; 7(1): 8617, 2017 08 17.
Artículo en Inglés | MEDLINE | ID: mdl-28819158

RESUMEN

A gene co-expression network (GCN) describes associations between genes and points to genetic coordination of biochemical pathways. However, genetic correlations in a GCN are only detectable if they are present in the sampled conditions. With the increasing quantity of gene expression samples available in public repositories, there is greater potential for discovery of genetic correlations from a variety of biologically interesting conditions. However, even if gene correlations are present, their discovery can be masked by noise. Noise is introduced from natural variation (intrinsic and extrinsic), systematic variation (caused by sample measurement protocols and instruments), and algorithmic and statistical variation created by selection of data processing tools. A variety of published studies, approaches and methods attempt to address each of these contributions of variation to reduce noise. Here we describe an approach using Gaussian Mixture Models (GMMs) to address natural extrinsic (condition-specific) variation during network construction from mixed input conditions. To demonstrate utility, we build and analyze a condition-annotated GCN from a compendium of 2,016 mixed gene expression data sets from five tumor subtypes obtained from The Cancer Genome Atlas. Our results show that GMMs help discover tumor subtype specific gene co-expression patterns (modules) that are significantly enriched for clinical attributes.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Neoplasias/genética , Algoritmos , Ontología de Genes , Humanos , Modelos Genéticos , Neoplasias/clasificación , Neoplasias/diagnóstico , Distribución Normal , Reproducibilidad de los Resultados
18.
Bioinform Biol Insights ; 10: 133-41, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27499617

RESUMEN

High-throughput DNA sequencing technology has revolutionized the study of gene expression while introducing significant computational challenges for biologists. These computational challenges include access to sufficient computer hardware and functional data processing workflows. Both these challenges are addressed with our scalable, open-source Pegasus workflow for processing high-throughput DNA sequence datasets into a gene expression matrix (GEM) using computational resources available to U.S.-based researchers on the Open Science Grid (OSG). We describe the usage of the workflow (OSG-GEM), discuss workflow design, inspect performance data, and assess accuracy in mapping paired-end sequencing reads to a reference genome. A target OSG-GEM user is proficient with the Linux command line and possesses basic bioinformatics experience. The user may run this workflow directly on the OSG or adapt it to novel computing environments.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA