Búsqueda | BVS Nicaragua

1.

Enrichment of a subset of Neanderthal polymorphisms in autistic probands and siblings.

Pauly, Rini; Johnson, Layla; Feltus, F Alex; Casanova, Emily L.

Mol Psychiatry ; 2024 May 17.

Artículo en Inglés | MEDLINE | ID: mdl-38760502

RESUMEN

Homo sapiens and Neanderthals underwent hybridization during the Middle/Upper Paleolithic age, culminating in retention of small amounts of Neanderthal-derived DNA in the modern human genome. In the current study, we address the potential roles Neanderthal single nucleotide polymorphisms (SNP) may be playing in autism susceptibility in samples of black non-Hispanic, white Hispanic, and white non-Hispanic people using data from the Simons Foundation Powering Autism Research (SPARK), Genotype-Tissue Expression (GTEx), and 1000 Genomes (1000G) databases. We have discovered that rare variants are significantly enriched in autistic probands compared to race-matched controls. In addition, we have identified 25 rare and common SNPs that are significantly enriched in autism on different ethnic backgrounds, some of which show significant clinical associations. We have also identified other SNPs that share more specific genotype-phenotype correlations but which are not necessarily enriched in autism and yet may nevertheless play roles in comorbid phenotype expression (e.g., intellectual disability, epilepsy, and language regression). These results strongly suggest Neanderthal-derived DNA is playing a significant role in autism susceptibility across major populations in the United States.

2.

Laser Capture Microdissection Transcriptome Reveals Spatiotemporal Tissue Gene Expression Patterns of Medicago truncatula Roots Responding to Rhizobia.

Schnabel, Elise; Thomas, Jacklyn; El-Hawaz, Rabia; Gao, Yueyao; Poehlman, William L; Chavan, Suchitra; Pasha, Asher; Esteban, Eddi; Provart, Nicholas; Feltus, F Alex; Frugoli, Julia.

Mol Plant Microbe Interact ; 36(12): 805-820, 2023 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-37717250

RESUMEN

We report a public resource for examining the spatiotemporal RNA expression of 54,893 Medicago truncatula genes during the first 72 h of response to rhizobial inoculation. Using a methodology that allows synchronous inoculation and growth of more than 100 plants in a single media container, we harvested the same segment of each root responding to rhizobia in the initial inoculation over a time course, collected individual tissues from these segments with laser capture microdissection, and created and sequenced RNA libraries generated from these tissues. We demonstrate the utility of the resource by examining the expression patterns of a set of genes induced very early in nodule signaling, as well as two gene families (CLE peptides and nodule specific PLAT-domain proteins) and show that despite similar whole-root expression patterns, there are tissue differences in expression between the genes. Using a rhizobial response dataset generated from transcriptomics on intact root segments, we also examined differential temporal expression patterns and determined that, after nodule tissue, the epidermis and cortical cells contained the most temporally patterned genes. We circumscribed gene lists for each time and tissue examined and developed an expression pattern visualization tool. Finally, we explored transcriptomic differences between the inner cortical cells that become nodules and those that do not, confirming that the expression of 1-aminocyclopropane-1-carboxylate synthases distinguishes inner cortical cells that become nodules and provide and describe potential downstream genes involved in early nodule cell division. [Formula: see text] Copyright © 2023 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.

Asunto(s)

Medicago truncatula , Rhizobium , Nódulos de las Raíces de las Plantas/metabolismo , Transcriptoma/genética , Raíces de Plantas , Medicago truncatula/metabolismo , Captura por Microdisección con Láser , Rhizobium/genética , ARN/metabolismo , Simbiosis/genética , Regulación de la Expresión Génica de las Plantas , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Nodulación de la Raíz de la Planta/genética

3.

Computational speed-up of large-scale, single-cell model simulations via a fully integrated SBML-based format.

Mutsuddy, Arnab; Erdem, Cemal; Huggins, Jonah R; Salim, Misha; Cook, Daniel; Hobbs, Nicole; Feltus, F Alex; Birtwistle, Marc R.

Bioinform Adv ; 3(1): vbad039, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37020976

RESUMEN

Summary: Large-scale and whole-cell modeling has multiple challenges, including scalable model building and module communication bottlenecks (e.g. between metabolism, gene expression, signaling, etc.). We previously developed an open-source, scalable format for a large-scale mechanistic model of proliferation and death signaling dynamics, but communication bottlenecks between gene expression and protein biochemistry modules remained. Here, we developed two solutions to communication bottlenecks that speed-up simulation by â¼4-fold for hybrid stochastic-deterministic simulations and by over 100-fold for fully deterministic simulations. Fully deterministic speed-up facilitates model initialization, parameter estimation and sensitivity analysis tasks. Availability and implementation: Source code is freely available at https://github.com/birtwistlelab/SPARCED/releases/tag/v1.3.0 implemented in python, and supported on Linux, Windows and MacOS (via Docker).

4.

Simulating the restoration of normal gene expression from different thyroid cancer stages using deep learning.

Nelligan, Nicole M; Bender, M Reed; Feltus, F Alex.

BMC Cancer ; 22(1): 612, 2022 Jun 04.

Artículo en Inglés | MEDLINE | ID: mdl-35659616

RESUMEN

BACKGROUND: Thyroid cancer (THCA) is the most common endocrine malignancy and incidence is increasing. There is an urgent need to better understand the molecular differences between THCA tumors at different pathologic stages so appropriate diagnostic, prognostic, and treatment strategies can be applied. Transcriptome State Perturbation Generator (TSPG) is a tool created to identify the changes in gene expression necessary to transform the transcriptional state of a source sample to mimic that of a target. METHODS: We used TSPG to perturb the bulk RNA expression data from various THCA tumor samples at progressive stages towards the transcriptional pattern of normal thyroid tissue. The perturbations produced were analyzed to determine if there are consistently up- or down-regulated genes or functions in certain stages of tumors. RESULTS: Some genes of particular interest were investigated further in previous research. SLC6A15 was found to be down-regulated in all stage 1-3 samples. This gene has previously been identified as a tumor suppressor. The up-regulation of PLA2G12B in all samples was notable because the protein encoded by this gene belongs to the PLA2 superfamily, which is involved in metabolism, a major function of the thyroid gland. REN was up-regulated in all stage 3 and 4 samples. The enzyme renin encoded by this gene, has a role in the renin-angiotensin system; this system regulates angiogenesis and may have a role in cancer development and progression. This is supported by the consistent up-regulation of REN only in later stage tumor samples. Functional enrichment analysis showed that olfactory receptor activities and similar terms were enriched for the up-regulated genes which supports previous research concluding that abundance and stimulation of olfactory receptors is linked to cancer. CONCLUSIONS: TSPG can be a useful tool in exploring large gene expression datasets and extracting the meaningful differences between distinct classes of data. We identified genes that were characteristically perturbed in certain sample types, including only late-stage THCA tumors. Additionally, we provided evidence for potential transcriptional signatures of each stage of thyroid cancer. These are potentially relevant targets for future investigation into THCA tumorigenesis.

Asunto(s)

Sistemas de Transporte de Aminoácidos Neutros , Aprendizaje Profundo , Neoplasias de la Tiroides , Sistemas de Transporte de Aminoácidos Neutros/genética , Regulación Neoplásica de la Expresión Génica , Humanos , Proteínas del Tejido Nervioso/genética , Pronóstico , Neoplasias de la Tiroides/patología , Transcriptoma

5.

A scalable, open-source implementation of a large-scale mechanistic model for single cell proliferation and death signaling.

Erdem, Cemal; Mutsuddy, Arnab; Bensman, Ethan M; Dodd, William B; Saint-Antoine, Michael M; Bouhaddou, Mehdi; Blake, Robert C; Gross, Sean M; Heiser, Laura M; Feltus, F Alex; Birtwistle, Marc R.

Nat Commun ; 13(1): 3555, 2022 06 21.

Artículo en Inglés | MEDLINE | ID: mdl-35729113

RESUMEN

Mechanistic models of how single cells respond to different perturbations can help integrate disparate big data sets or predict response to varied drug combinations. However, the construction and simulation of such models have proved challenging. Here, we developed a python-based model creation and simulation pipeline that converts a few structured text files into an SBML standard and is high-performance- and cloud-computing ready. We applied this pipeline to our large-scale, mechanistic pan-cancer signaling model (named SPARCED) and demonstrate it by adding an IFNÎ³ pathway submodel. We then investigated whether a putative crosstalk mechanism could be consistent with experimental observations from the LINCS MCF10A Data Cube that IFNÎ³ acts as an anti-proliferative factor. The analyses suggested this observation can be explained by IFNÎ³-induced SOCS1 sequestering activated EGF receptors. This work forms a foundational recipe for increased mechanistic model-based data integration on a single-cell level, an important building block for clinically-predictive mechanistic models.

Asunto(s)

Nube Computacional , Programas Informáticos , Proliferación Celular , Simulación por Computador , Transducción de Señal

6.

GEMmaker: process massive RNA-seq datasets on heterogeneous computational infrastructure.

Hadish, John A; Biggs, Tyler D; Shealy, Benjamin T; Bender, M Reed; McKnight, Coleman B; Wytko, Connor; Smith, Melissa C; Feltus, F Alex; Honaas, Loren; Ficklin, Stephen P.

BMC Bioinformatics ; 23(1): 156, 2022 May 02.

Artículo en Inglés | MEDLINE | ID: mdl-35501696

RESUMEN

BACKGROUND: Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility. Processing of larger and deeper RNA-seq experiments will become more common as sequencing technology matures. RESULTS: GEMmaker, is a nf-core compliant, Nextflow workflow, that quantifies gene expression from small to massive RNA-seq datasets. GEMmaker ensures results are highly reproducible through the use of versioned containerized software that can be executed on a single workstation, institutional compute cluster, Kubernetes platform or the cloud. GEMmaker supports popular alignment and quantification tools providing results in raw and normalized formats. GEMmaker is unique in that it can scale to process thousands of local or remote stored samples without exceeding available data storage. CONCLUSIONS: Workflows that quantify gene expression are not new, and many already address issues of portability, reusability, and scale in terms of access to CPUs. GEMmaker provides these benefits and adds the ability to scale despite low data storage infrastructure. This allows users to process hundreds to thousands of RNA-seq samples even when data storage resources are limited. GEMmaker is freely available and fully documented with step-by-step setup and execution instructions.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , RNA-Seq , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos

7.

Addressing noise in co-expression network construction.

Burns, Joshua J R; Shealy, Benjamin T; Greer, Mitchell S; Hadish, John A; McGowan, Matthew T; Biggs, Tyler; Smith, Melissa C; Feltus, F Alex; Ficklin, Stephen P.

Brief Bioinform ; 23(1)2022 01 17.

Artículo en Inglés | MEDLINE | ID: mdl-34850822

RESUMEN

Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The 'one-size-fits-all' approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.

Asunto(s)

Redes Reguladoras de Genes , Transcriptoma

8.

Named Data Networking for Genomics Data Management and Integrated Workflows.

Ogle, Cameron; Reddick, David; McKnight, Coleman; Biggs, Tyler; Pauly, Rini; Ficklin, Stephen P; Feltus, F Alex; Shannigrahi, Susmit.

Front Big Data ; 4: 582468, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-33748749

RESUMEN

Advanced imaging and DNA sequencing technologies now enable the diverse biology community to routinely generate and analyze terabytes of high resolution biological data. The community is rapidly heading toward the petascale in single investigator laboratory settings. As evidence, the single NCBI SRA central DNA sequence repository contains over 45 petabytes of biological data. Given the geometric growth of this and other genomics repositories, an exabyte of mineable biological data is imminent. The challenges of effectively utilizing these datasets are enormous as they are not only large in the size but also stored in geographically distributed repositories in various repositories such as National Center for Biotechnology Information (NCBI), DNA Data Bank of Japan (DDBJ), European Bioinformatics Institute (EBI), and NASA's GeneLab. In this work, we first systematically point out the data-management challenges of the genomics community. We then introduce Named Data Networking (NDN), a novel but well-researched Internet architecture, is capable of solving these challenges at the network layer. NDN performs all operations such as forwarding requests to data sources, content discovery, access, and retrieval using content names (that are similar to traditional filenames or filepaths) and eliminates the need for a location layer (the IP address) for data management. Utilizing NDN for genomics workflows simplifies data discovery, speeds up data retrieval using in-network caching of popular datasets, and allows the community to create infrastructure that supports operations such as creating federation of content repositories, retrieval from multiple sources, remote data subsetting, and others. Named based operations also streamlines deployment and integration of workflows with various cloud platforms. Our contributions in this work are as follows 1) we enumerate the cyberinfrastructure challenges of the genomics community that NDN can alleviate, and 2) we describe our efforts in applying NDN for a contemporary genomics workflow (GEMmaker) and quantify the improvements. The preliminary evaluation shows a sixfold speed up in data insertion into the workflow. 3) As a pilot, we have used an NDN naming scheme (agreed upon by the community and discussed in Section 4) to publish data from broadly used data repositories including the NCBI SRA. We have loaded the NDN testbed with these pre-processed genomes that can be accessed over NDN and used by anyone interested in those datasets. Finally, we discuss our continued effort in integrating NDN with cloud computing platforms, such as the Pacific Research Platform (PRP). The reader should note that the goal of this paper is to introduce NDN to the genomics community and discuss NDN's properties that can benefit the genomics community. We do not present an extensive performance evaluation of NDN-we are working on extending and evaluating our pilot deployment and will present systematic results in a future work.

9.

Cellular State Transformations Using Deep Learning for Precision Medicine Applications.

Targonski, Colin; Bender, M Reed; Shealy, Benjamin T; Husain, Benafsh; Paseman, Bill; Smith, Melissa C; Feltus, F Alex.

Patterns (N Y) ; 1(6): 100087, 2020 Sep 11.

Artículo en Inglés | MEDLINE | ID: mdl-33205131

RESUMEN

We introduce the Transcriptome State Perturbation Generator (TSPG) as a novel deep-learning method to identify changes in genomic expression that occur between tissue states using generative adversarial networks. TSPG learns the transcriptome perturbations from RNA-sequencing data required to shift from a source to a target class. We apply TSPG as an effective method of detecting biologically relevant alternate expression patterns between normal and tumor human tissue samples. We demonstrate that the application of TSPG to expression data obtained from a biopsy sample of a patient's kidney cancer can identify patient-specific differentially expressed genes between their individual tumor sample and a target class of healthy kidney gene expression. By utilizing TSPG in a precision medicine application in which the patient sample is not replicated (i.e., n = 1 ), we present a novel technique of determining significant transcriptional aberrations that can be used to help identify potential targeted therapies.

10.

Exploration into biomarker potential of region-specific brain gene co-expression networks.

Hang, Yuqing; Aburidi, Mohammed; Husain, Benafsh; Hickman, Allison R; Poehlman, William L; Feltus, F Alex.

Sci Rep ; 10(1): 17089, 2020 10 13.

Artículo en Inglés | MEDLINE | ID: mdl-33051491

RESUMEN

The human brain is a complex organ that consists of several regions each with a unique gene expression pattern. Our intent in this study was to construct a gene co-expression network (GCN) for the normal brain using RNA expression profiles from the Genotype-Tissue Expression (GTEx) project. The brain GCN contains gene correlation relationships that are broadly present in the brain or specific to thirteen brain regions, which we later combined into six overarching brain mini-GCNs based on the brain's structure. Using the expression profiles of brain region-specific GCN edges, we determined how well the brain region samples could be discriminated from each other, visually with t-SNE plots or quantitatively with the Gene Oracle deep learning classifier. Next, we tested these gene sets on their relevance to human tumors of brain and non-brain origin. Interestingly, we found that genes in the six brain mini-GCNs showed markedly higher mutation rates in tumors relative to matched sets of random genes. Further, we found that cortex genes subdivided Head and Neck Squamous Cell Carcinoma (HNSC) tumors and Pheochromocytoma and Paraganglioma (PCPG) tumors into distinct groups. The brain GCN and mini-GCNs are useful resources for the classification of brain regions and identification of biomarker genes for brain related phenotypes.

Asunto(s)

Biomarcadores/metabolismo , Encéfalo/metabolismo , Redes Reguladoras de Genes , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/metabolismo , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Marcadores Genéticos , Humanos , Modelos Genéticos , Modelos Neurológicos , Mutación , Redes Neurales de la Computación , Distribución Tisular

11.

NetExtractor: Extracting a Cerebellar Tissue Gene Regulatory Network Using Differentially Expressed High Mutual Information Binary RNA Profiles.

Husain, Benafsh; Hickman, Allison R; Hang, Yuqing; Shealy, Benjamin T; Sapra, Karan; Feltus, F Alex.

G3 (Bethesda) ; 10(9): 2953-2963, 2020 09 02.

Artículo en Inglés | MEDLINE | ID: mdl-32665353

RESUMEN

Bigenic expression relationships are conventionally defined based on metrics such as Pearson or Spearman correlation that cannot typically detect latent, non-linear dependencies or require the relationship to be monotonic. Further, the combination of intrinsic and extrinsic noise as well as embedded relationships between sample sub-populations reduces the probability of extracting biologically relevant edges during the construction of gene co-expression networks (GCNs). In this report, we address these problems via our NetExtractor algorithm. NetExtractor examines all pairwise gene expression profiles first with Gaussian mixture models (GMMs) to identify sample sub-populations followed by mutual information (MI) analysis that is capable of detecting non-linear differential bigenic expression relationships. We applied NetExtractor to brain tissue RNA profiles from the Genotype-Tissue Expression (GTEx) project to obtain a brain tissue specific gene expression relationship network centered on cerebellar and cerebellar hemisphere enriched edges. We leveraged the PsychENCODE pre-frontal cortex (PFC) gene regulatory network (GRN) to construct a cerebellar cortex (cerebellar) GRN associated with transcriptionally active regions in cerebellar tissue. Thus, we demonstrate the utility of our NetExtractor approach to detect biologically relevant and novel non-linear binary gene relationships.

Asunto(s)

Redes Reguladoras de Genes , ARN , Algoritmos , Encéfalo , Cerebelo , Biología Computacional , Perfilación de la Expresión Génica

12.

Tripal and Galaxy: supporting reproducible scientific workflows for community biological databases.

Spoor, Shawna; Wytko, Connor; Soto, Brian; Chen, Ming; Almsaeed, Abdullah; Condon, Bradford; Herndon, Nic; Hough, Heidi; Jung, Sook; Staton, Meg; Wegrzyn, Jill; Main, Dorrie; Feltus, F Alex; Ficklin, Stephen P.

Database (Oxford) ; 20202020 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-32621602

RESUMEN

Online biological databases housing genomics, genetic and breeding data can be constructed using the Tripal toolkit. Tripal is an open-source, internationally developed framework that implements FAIR data principles and is meant to ease the burden of constructing such websites for research communities. Use of a common, open framework improves the sustainability and manageability of such as site. Site developers can create extensions for their site and in turn share those extensions with others. One challenge that community databases often face is the need to provide tools for their users that analyze increasingly larger datasets using multiple software tools strung together in a scientific workflow on complicated computational resources. The Tripal Galaxy module, a 'plug-in' for Tripal, meets this need through integration of Tripal with the Galaxy Project workflow management system. Site developers can create workflows appropriate to the needs of their community using Galaxy and then share those for execution on their Tripal sites via automatically constructed, but configurable, web forms or using an application programming interface to power web-based analytical applications. The Tripal Galaxy module helps reduce duplication of effort by allowing site developers to spend time constructing workflows and building their applications rather than rebuilding infrastructure for job management of multi-step applications.

Asunto(s)

Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Internet , Programas Informáticos , Biología Computacional

13.

Negotiated Sharing of Pandemic Data, Models, and Resources.

Cutcher-Gershenfeld, Joel; Baker, Karen S; Berente, Nicholas; Berkman, Paul Arthur; Canavan, Pat; Feltus, F Alex; Garmulewicz, Alysia; Hutchins, Ron; King, John Leslie; Kirkpatrick, Christine; Lenhardt, Chris; Lewis, Spencer; Maffe, Michael; Mittleman, Barbara; Sampath, Rajesh; Shin, Namchul; Stall, Shelley; Winter, Susan; Veazey, Pips.

Negot J ; 36(4): 497-534, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-38607846

RESUMEN

Urgent responses to the COVID-19 pandemic depend on increased collaboration and sharing of data, models, and resources among scientists and researchers. In many scientific fields and disciplines, institutional norms treat data, models, and resources as proprietary, emphasizing competition among scientists and researchers locally and internationally. Concurrently, long-standing norms of open data and collaboration exist in some scientific fields and have accelerated within the last two decades. In both cases-where the institutional arrangements are ready to accelerate for the needed collaboration in a pandemic and where they run counter to what is needed-the rules of the game are "on the table" for institutional-level renegotiation. These challenges to the negotiated order in science are important, difficult to study, and highly consequential. The COVID-19 pandemic offers something of a natural experiment to study these dynamics. Preliminary findings highlight: the chilling effect of politics where open sharing could be expected to accelerate; the surprisingly conservative nature of contests and prizes; open questions around whether collaboration will persist following an inflection point in the pandemic; and the strong potential for launching and sustaining pre-competitive initiatives.

14.

EdgeScaping: Mapping the spatial distribution of pairwise gene expression intensities.

Husain, Benafsh; Feltus, F Alex.

PLoS One ; 14(8): e0220279, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-31386677

RESUMEN

Gene co-expression networks (GCNs) are constructed from Gene Expression Matrices (GEMs) in a bottom up approach where all gene pairs are tested for correlation within the context of the input sample set. This approach is computationally intensive for many current GEMs and may not be scalable to millions of samples. Further, traditional GCNs do not detect non-linear relationships missed by correlation tests and do not place genetic relationships in a gene expression intensity context. In this report, we propose EdgeScaping, which constructs and analyzes the pairwise gene intensity network in a holistic, top down approach where no edges are filtered. EdgeScaping uses a novel technique to convert traditional pairwise gene expression data to an image based format. This conversion not only performs feature compression, making our algorithm highly scalable, but it also allows for exploring non-linear relationships between genes by leveraging deep learning image analysis algorithms. Using the learned embedded feature space we implement a fast, efficient algorithm to cluster the entire space of gene expression relationships while retaining gene expression intensity. Since EdgeScaping does not eliminate conventionally noisy edges, it extends the identification of co-expression relationships beyond classically correlated edges to facilitate the discovery of novel or unusual expression patterns within the network. We applied EdgeScaping to a human tumor GEM to identify sets of genes that exhibit conventional and non-conventional interdependent non-linear behavior associated with brain specific tumor sub-types that would be eliminated in conventional bottom-up construction of GCNs. Edgescaping source code is available at https://github.com/bhusain/EdgeScaping under the MIT license.

Asunto(s)

Biología Computacional/métodos , Redes Reguladoras de Genes , Análisis por Conglomerados , Aprendizaje Profundo , Humanos , Neoplasias/genética , Análisis Espacial , Flujo de Trabajo

15.

Uncovering biomarker genes with enriched classification potential from Hallmark gene sets.

Targonski, Colin A; Shearer, Courtney A; Shealy, Benjamin T; Smith, Melissa C; Feltus, F Alex.

Sci Rep ; 9(1): 9747, 2019 07 05.

Artículo en Inglés | MEDLINE | ID: mdl-31278367

RESUMEN

Given the complex relationship between gene expression and phenotypic outcomes, computationally efficient approaches are needed to sift through large high-dimensional datasets in order to identify biologically relevant biomarkers. In this report, we describe a method of identifying the most salient biomarker genes in a dataset, which we call "candidate genes", by evaluating the ability of gene combinations to classify samples from a dataset, which we call "classification potential". Our algorithm, Gene Oracle, uses a neural network to test user defined gene sets for polygenic classification potential and then uses a combinatorial approach to further decompose selected gene sets into candidate and non-candidate biomarker genes. We tested this algorithm on curated gene sets from the Molecular Signatures Database (MSigDB) quantified in RNAseq gene expression matrices obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) data repositories. First, we identified which MSigDB Hallmark subsets have significant classification potential for both the TCGA and GTEx datasets. Then, we identified the most discriminatory candidate biomarker genes in each Hallmark gene set and provide evidence that the improved biomarker potential of these genes may be due to reduced functional complexity.

Asunto(s)

Biomarcadores de Tumor/genética , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Oncogenes , Algoritmos , Biología Computacional/métodos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Ontología de Genes , Humanos

16.

Tripal v3: an ontology-based toolkit for construction of FAIR biological community databases.

Spoor, Shawna; Cheng, Chun-Huai; Sanderson, Lacey-Anne; Condon, Bradford; Almsaeed, Abdullah; Chen, Ming; Bretaudeau, Anthony; Rasche, Helena; Jung, Sook; Main, Dorrie; Bett, Kirstin; Staton, Margaret; Wegrzyn, Jill L; Feltus, F Alex; Ficklin, Stephen P.

Database (Oxford) ; 20192019 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-31328773

RESUMEN

Community biological databases provide an important online resource for both public and private data, analysis tools and community engagement. These sites house genomic, transcriptomic, genetic, breeding and ancillary data for specific species, families or clades. Due to the complexity and increasing quantities of these data, construction of online resources is increasingly difficult especially with limited funding and access to technical expertise. Furthermore, online repositories are expected to promote FAIR data principles (findable, accessible, interoperable and reusable) that presents additional challenges. The open-source Tripal database toolkit seeks to mitigate these challenges by creating both the software and an interactive community of developers for construction of online community databases. Additionally, through coordinated, distributed co-development, Tripal sites encourage community-wide sustainability. Here, we report the release of Tripal version 3 that improves data accessibility and data sharing through systematic use of controlled vocabularies (CVs). Tripal uses the community-developed Chado database as a default data store, but now provides tools to support other data stores, while ensuring that CVs remain the central organizational structure for the data. A new site developer can use Tripal to develop a basic site with little to no programming, with the ability to integrate other data types using extension modules and the Tripal application programming interface. A thorough online User's Guide and Developer's Handbook are available at http://tripal.info, providing download, installation and step-by-step setup instructions.

Asunto(s)

Biota/genética , Bases de Datos Genéticas , Difusión de la Información , Internet , Programas Informáticos , Transcriptoma , Genómica

17.

Moving Just Enough Deep Sequencing Data to Get the Job Done.

Mills, Nicholas; Bensman, Ethan M; Poehlman, William L; Ligon, Walter B; Feltus, F Alex.

Bioinform Biol Insights ; 13: 1177932219856359, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-31236009

RESUMEN

MOTIVATION: As the size of high-throughput DNA sequence datasets continues to grow, the cost of transferring and storing the datasets may prevent their processing in all but the largest data centers or commercial cloud providers. To lower this cost, it should be possible to process only a subset of the original data while still preserving the biological information of interest. RESULTS: Using 4 high-throughput DNA sequence datasets of differing sequencing depth from 2 species as use cases, we demonstrate the effect of processing partial datasets on the number of detected RNA transcripts using an RNA-Seq workflow. We used transcript detection to decide on a cutoff point. We then physically transferred the minimal partial dataset and compared with the transfer of the full dataset, which showed a reduction of approximately 25% in the total transfer time. These results suggest that as sequencing datasets get larger, one way to speed up analysis is to simply transfer the minimal amount of data that still sufficiently detects biological signal. AVAILABILITY: All results were generated using public datasets from NCBI and publicly available open source software.

18.

Ergot alkaloid exposure during gestation alters: 3. Fetal growth, muscle fiber development, and miRNA transcriptome1.

Greene, Maslyn A; Britt, Jessica L; Powell, Rhonda R; Feltus, F Alex; Bridges, William C; Bruce, Terri; Klotz, James L; Miller, Markus F; Duckett, Susan K.

J Anim Sci ; 97(7): 3153-3168, 2019 Jul 02.

Artículo en Inglés | MEDLINE | ID: mdl-31051033

RESUMEN

The objective of this study was to assess how exposure to ergot alkaloids during 2 stages of gestation alters fetal growth, muscle fiber formation, and miRNA expression. Pregnant ewes (n = 36; BW = 83.26 ± 8.14 kg; 4/group; 9 groups) were used in a 2 × 2 factorial arrangement with 2 tall fescue seed treatments [endophyte-infected (E+) vs. endophyte-free (E-)] fed during 2 stages of gestation (MID, days 35 to 85 vs. LATE, days 86 to 133), which created 4 possible treatments (E-/E-, E+/E-, E-/E+, or E+/E+). Ewes were individually fed a total mixed ration containing E+ or E- fescue seed according to treatment assignment. Terminal surgeries were conducted on day 133 of gestation for the collection of fetal measurements and muscle samples. Data were analyzed as a 2 × 2 factorial with fescue treatment, stage of gestation, and 2-way interaction as fixed effects. Fetuses exposed to E+ seed during LATE gestation had reduced (P = 0.0020) fetal BW by 10% compared with E- fetuses; however, fetal body weight did not differ (P = 0.41) with E+ exposure during MID gestation. Fetuses from ewes fed E+ seed during MID and LATE gestation tended to have smaller (P = 0.058) kidney weights compared with E- fetuses. Liver weight was larger (P = 0.0069) in fetuses fed E- during LATE gestation compared with E+. Fetal brain weight did not differ by fescue treatment fed during MID (P = 0.36) or LATE (P = 0.40) gestation. The percentage of brain to empty body weight (EBW) was greater (P = 0.0048) in fetuses from ewes fed E+ fescue seed during LATE gestation, which is indicative of intrauterine growth restriction (IUGR). Primary muscle fiber number was lower (P = 0.0005) in semitendinosus (STN) of fetuses exposed to E+ during MID and/or LATE gestation compared with E-/E-. miRNA sequencing showed differential expression (P < 0.010) of 6 novel miRNAs including bta-miR-652_R+1, mdo-miR-22-3p, bta-miR-1277_R-1, ppy-miR-133a_L+1_1ss5TG, hsa-miR-129-1-3p, and ssc-miR-615 in fetal STN muscle. These miRNA are associated with glucose transport, insulin signaling, intracellular ATP, hypertension, or adipogenesis. This work supports the hypothesis that E+ tall fescue seed fed during late gestation reduces fetal weight and causes asymmetrical growth, which is indicative of IUGR. Changes in primary fiber number and miRNA of STN indicate that exposure to E+ fescue fed during MID and LATE gestation alters fetal muscle development that may affect postnatal muscle growth and meat quality.

Asunto(s)

Endófitos/fisiología , Alcaloides de Claviceps/toxicidad , Festuca/química , MicroARNs/genética , Ovinos/fisiología , Transcriptoma/efectos de los fármacos , Animales , Encéfalo/efectos de los fármacos , Encéfalo/crecimiento & desarrollo , Ergotaminas/toxicidad , Femenino , Festuca/microbiología , Desarrollo Fetal/efectos de los fármacos , Peso Fetal/efectos de los fármacos , Fibras Musculares Esqueléticas/efectos de los fármacos , Placentación , Embarazo , Semillas/química , Semillas/microbiología , Ovinos/crecimiento & desarrollo

19.

Linking Binary Gene Relationships to Drivers of Renal Cell Carcinoma Reveals Convergent Function in Alternate Tumor Progression Paths.

Poehlman, William L; Hsieh, James J; Feltus, F Alex.

Sci Rep ; 9(1): 2899, 2019 02 27.

Artículo en Inglés | MEDLINE | ID: mdl-30814637

RESUMEN

Renal cell carcinoma (RCC) subtypes are characterized by distinct molecular profiles. Using RNA expression profiles from 1,009 RCC samples, we constructed a condition-annotated gene coexpression network (GCN). The RCC GCN contains binary gene coexpression relationships (edges) specific to conditions including RCC subtype and tumor stage. As an application of this resource, we discovered RCC GCN edges and modules that were associated with genetic lesions in known RCC driver genes, including VHL, a common initiating clear cell RCC (ccRCC) genetic lesion, and PBRM1 and BAP1 which are early genetic lesions in the Braided Cancer River Model (BCRM). Since ccRCC tumors with PBRM1 mutations respond to targeted therapy differently than tumors with BAP1 mutations, we focused on ccRCC-specific edges associated with tumors that exhibit alternate mutation profiles: VHL-PBRM1 or VHL-BAP1. We found specific blends molecular functions associated with these two mutation paths. Despite these mutation-associated edges having unique genes, they were enriched for the same immunological functions suggesting a convergent functional role for alternate gene sets consistent with the BCRM. The condition annotated RCC GCN described herein is a novel data mining resource for the assignment of polygenic biomarkers and their relationships to RCC tumors with specific molecular and mutational profiles.

Asunto(s)

Carcinoma de Células Renales/genética , Neoplasias Renales/genética , Mutación/genética , Carcinogénesis/genética , Carcinoma de Células Renales/patología , Proteínas de Unión al ADN/genética , Conjuntos de Datos como Asunto , Progresión de la Enfermedad , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos , Neoplasias Renales/patología , Estadificación de Neoplasias , Proteínas de Fusión Oncogénica/genética , Factores de Transcripción/genética , Transcriptoma , Proteínas Supresoras de Tumor/genética , Ubiquitina Tiolesterasa/genética , Proteína Supresora de Tumores del Síndrome de Von Hippel-Lindau/genética

20.

Discovering Condition-Specific Gene Co-Expression Patterns Using Gaussian Mixture Models: A Cancer Case Study.

Ficklin, Stephen P; Dunwoodie, Leland J; Poehlman, William L; Watson, Christopher; Roche, Kimberly E; Feltus, F Alex.

Sci Rep ; 7(1): 8617, 2017 08 17.

Artículo en Inglés | MEDLINE | ID: mdl-28819158

RESUMEN

A gene co-expression network (GCN) describes associations between genes and points to genetic coordination of biochemical pathways. However, genetic correlations in a GCN are only detectable if they are present in the sampled conditions. With the increasing quantity of gene expression samples available in public repositories, there is greater potential for discovery of genetic correlations from a variety of biologically interesting conditions. However, even if gene correlations are present, their discovery can be masked by noise. Noise is introduced from natural variation (intrinsic and extrinsic), systematic variation (caused by sample measurement protocols and instruments), and algorithmic and statistical variation created by selection of data processing tools. A variety of published studies, approaches and methods attempt to address each of these contributions of variation to reduce noise. Here we describe an approach using Gaussian Mixture Models (GMMs) to address natural extrinsic (condition-specific) variation during network construction from mixed input conditions. To demonstrate utility, we build and analyze a condition-annotated GCN from a compendium of 2,016 mixed gene expression data sets from five tumor subtypes obtained from The Cancer Genome Atlas. Our results show that GMMs help discover tumor subtype specific gene co-expression patterns (modules) that are significantly enriched for clinical attributes.

Asunto(s)

Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Neoplasias/genética , Algoritmos , Ontología de Genes , Humanos , Modelos Genéticos , Neoplasias/clasificación , Neoplasias/diagnóstico , Distribución Normal , Reproducibilidad de los Resultados

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA