Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Mol Psychiatry ; 2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38760502

RESUMEN

Homo sapiens and Neanderthals underwent hybridization during the Middle/Upper Paleolithic age, culminating in retention of small amounts of Neanderthal-derived DNA in the modern human genome. In the current study, we address the potential roles Neanderthal single nucleotide polymorphisms (SNP) may be playing in autism susceptibility in samples of black non-Hispanic, white Hispanic, and white non-Hispanic people using data from the Simons Foundation Powering Autism Research (SPARK), Genotype-Tissue Expression (GTEx), and 1000 Genomes (1000G) databases. We have discovered that rare variants are significantly enriched in autistic probands compared to race-matched controls. In addition, we have identified 25 rare and common SNPs that are significantly enriched in autism on different ethnic backgrounds, some of which show significant clinical associations. We have also identified other SNPs that share more specific genotype-phenotype correlations but which are not necessarily enriched in autism and yet may nevertheless play roles in comorbid phenotype expression (e.g., intellectual disability, epilepsy, and language regression). These results strongly suggest Neanderthal-derived DNA is playing a significant role in autism susceptibility across major populations in the United States.

2.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34850822

RESUMEN

Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The 'one-size-fits-all' approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow.


Asunto(s)
Redes Reguladoras de Genes , Transcriptoma
3.
Mol Plant Microbe Interact ; 36(12): 805-820, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37717250

RESUMEN

We report a public resource for examining the spatiotemporal RNA expression of 54,893 Medicago truncatula genes during the first 72 h of response to rhizobial inoculation. Using a methodology that allows synchronous inoculation and growth of more than 100 plants in a single media container, we harvested the same segment of each root responding to rhizobia in the initial inoculation over a time course, collected individual tissues from these segments with laser capture microdissection, and created and sequenced RNA libraries generated from these tissues. We demonstrate the utility of the resource by examining the expression patterns of a set of genes induced very early in nodule signaling, as well as two gene families (CLE peptides and nodule specific PLAT-domain proteins) and show that despite similar whole-root expression patterns, there are tissue differences in expression between the genes. Using a rhizobial response dataset generated from transcriptomics on intact root segments, we also examined differential temporal expression patterns and determined that, after nodule tissue, the epidermis and cortical cells contained the most temporally patterned genes. We circumscribed gene lists for each time and tissue examined and developed an expression pattern visualization tool. Finally, we explored transcriptomic differences between the inner cortical cells that become nodules and those that do not, confirming that the expression of 1-aminocyclopropane-1-carboxylate synthases distinguishes inner cortical cells that become nodules and provide and describe potential downstream genes involved in early nodule cell division. [Formula: see text] Copyright © 2023 The Author(s). This is an open access article distributed under the CC BY-NC-ND 4.0 International license.


Asunto(s)
Medicago truncatula , Rhizobium , Nódulos de las Raíces de las Plantas/metabolismo , Transcriptoma/genética , Raíces de Plantas , Medicago truncatula/metabolismo , Captura por Microdisección con Láser , Rhizobium/genética , ARN/metabolismo , Simbiosis/genética , Regulación de la Expresión Génica de las Plantas , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Nodulación de la Raíz de la Planta/genética
4.
BMC Bioinformatics ; 23(1): 156, 2022 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-35501696

RESUMEN

BACKGROUND: Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility. Processing of larger and deeper RNA-seq experiments will become more common as sequencing technology matures. RESULTS: GEMmaker, is a nf-core compliant, Nextflow workflow, that quantifies gene expression from small to massive RNA-seq datasets. GEMmaker ensures results are highly reproducible through the use of versioned containerized software that can be executed on a single workstation, institutional compute cluster, Kubernetes platform or the cloud. GEMmaker supports popular alignment and quantification tools providing results in raw and normalized formats. GEMmaker is unique in that it can scale to process thousands of local or remote stored samples without exceeding available data storage. CONCLUSIONS: Workflows that quantify gene expression are not new, and many already address issues of portability, reusability, and scale in terms of access to CPUs. GEMmaker provides these benefits and adds the ability to scale despite low data storage infrastructure. This allows users to process hundreds to thousands of RNA-seq samples even when data storage resources are limited. GEMmaker is freely available and fully documented with step-by-step setup and execution instructions.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , RNA-Seq , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos
5.
BMC Cancer ; 22(1): 612, 2022 Jun 04.
Artículo en Inglés | MEDLINE | ID: mdl-35659616

RESUMEN

BACKGROUND: Thyroid cancer (THCA) is the most common endocrine malignancy and incidence is increasing. There is an urgent need to better understand the molecular differences between THCA tumors at different pathologic stages so appropriate diagnostic, prognostic, and treatment strategies can be applied. Transcriptome State Perturbation Generator (TSPG) is a tool created to identify the changes in gene expression necessary to transform the transcriptional state of a source sample to mimic that of a target. METHODS: We used TSPG to perturb the bulk RNA expression data from various THCA tumor samples at progressive stages towards the transcriptional pattern of normal thyroid tissue. The perturbations produced were analyzed to determine if there are consistently up- or down-regulated genes or functions in certain stages of tumors. RESULTS: Some genes of particular interest were investigated further in previous research. SLC6A15 was found to be down-regulated in all stage 1-3 samples. This gene has previously been identified as a tumor suppressor. The up-regulation of PLA2G12B in all samples was notable because the protein encoded by this gene belongs to the PLA2 superfamily, which is involved in metabolism, a major function of the thyroid gland. REN was up-regulated in all stage 3 and 4 samples. The enzyme renin encoded by this gene, has a role in the renin-angiotensin system; this system regulates angiogenesis and may have a role in cancer development and progression. This is supported by the consistent up-regulation of REN only in later stage tumor samples. Functional enrichment analysis showed that olfactory receptor activities and similar terms were enriched for the up-regulated genes which supports previous research concluding that abundance and stimulation of olfactory receptors is linked to cancer. CONCLUSIONS: TSPG can be a useful tool in exploring large gene expression datasets and extracting the meaningful differences between distinct classes of data. We identified genes that were characteristically perturbed in certain sample types, including only late-stage THCA tumors. Additionally, we provided evidence for potential transcriptional signatures of each stage of thyroid cancer. These are potentially relevant targets for future investigation into THCA tumorigenesis.


Asunto(s)
Sistemas de Transporte de Aminoácidos Neutros , Aprendizaje Profundo , Neoplasias de la Tiroides , Sistemas de Transporte de Aminoácidos Neutros/genética , Regulación Neoplásica de la Expresión Génica , Humanos , Proteínas del Tejido Nervioso/genética , Pronóstico , Neoplasias de la Tiroides/patología , Transcriptoma
6.
Negot J ; 36(4): 497-534, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-38607846

RESUMEN

Urgent responses to the COVID-19 pandemic depend on increased collaboration and sharing of data, models, and resources among scientists and researchers. In many scientific fields and disciplines, institutional norms treat data, models, and resources as proprietary, emphasizing competition among scientists and researchers locally and internationally. Concurrently, long-standing norms of open data and collaboration exist in some scientific fields and have accelerated within the last two decades. In both cases-where the institutional arrangements are ready to accelerate for the needed collaboration in a pandemic and where they run counter to what is needed-the rules of the game are "on the table" for institutional-level renegotiation. These challenges to the negotiated order in science are important, difficult to study, and highly consequential. The COVID-19 pandemic offers something of a natural experiment to study these dynamics. Preliminary findings highlight: the chilling effect of politics where open sharing could be expected to accelerate; the surprisingly conservative nature of contests and prizes; open questions around whether collaboration will persist following an inflection point in the pandemic; and the strong potential for launching and sustaining pre-competitive initiatives.

7.
Nature ; 457(7229): 551-6, 2009 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-19189423

RESUMEN

Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the approximately 730-megabase Sorghum bicolor (L.) Moench genome, placing approximately 98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the approximately 75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization approximately 70 million years ago, most duplicated gene sets lost one member before the sorghum-rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum's drought tolerance.


Asunto(s)
Evolución Molecular , Genoma de Planta/genética , Poaceae/genética , Sorghum/genética , Arabidopsis/genética , Cromosomas de las Plantas/genética , Duplicación de Gen , Genes de Plantas , Oryza/genética , Populus/genética , Recombinación Genética/genética , Alineación de Secuencia , Análisis de Secuencia de ADN , Eliminación de Secuencia/genética , Zea mays/genética
8.
Proc Natl Acad Sci U S A ; 109(34): 13710-5, 2012 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-22869747

RESUMEN

Sex determination in papaya is controlled by a recently evolved XY chromosome pair, with two slightly different Y chromosomes controlling the development of males (Y) and hermaphrodites (Y(h)). To study the events of early sex chromosome evolution, we sequenced the hermaphrodite-specific region of the Y(h) chromosome (HSY) and its X counterpart, yielding an 8.1-megabase (Mb) HSY pseudomolecule, and a 3.5-Mb sequence for the corresponding X region. The HSY is larger than the X region, mostly due to retrotransposon insertions. The papaya HSY differs from the X region by two large-scale inversions, the first of which likely caused the recombination suppression between the X and Y(h) chromosomes, followed by numerous additional chromosomal rearrangements. Altogether, including the X and/or HSY regions, 124 transcription units were annotated, including 50 functional pairs present in both the X and HSY. Ten HSY genes had functional homologs elsewhere in the papaya autosomal regions, suggesting movement of genes onto the HSY, whereas the X region had none. Sequence divergence between 70 transcripts shared by the X and HSY revealed two evolutionary strata in the X chromosome, corresponding to the two inversions on the HSY, the older of which evolved about 7.0 million years ago. Gene content differences between the HSY and X are greatest in the older stratum, whereas the gene content and order of the collinear regions are identical. Our findings support theoretical models of early sex chromosome evolution.


Asunto(s)
Carica/genética , Cromosomas Sexuales , Duplicación Cromosómica , Inversión Cromosómica , Mapeo Cromosómico , Cromosomas Artificiales Bacterianos , Cromosomas de las Plantas , Evolución Molecular , Modelos Genéticos , Datos de Secuencia Molecular , Secuencias Repetitivas de Ácidos Nucleicos , Retroelementos , Análisis de Secuencia de ADN
9.
Theor Appl Genet ; 126(9): 2367-80, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23836384

RESUMEN

For lignocellulosic bioenergy to be economically viable, genetic improvements must be made in feedstock quality including both biomass total yield and conversion efficiency. Toward this goal, multiple studies have considered candidate genes and discovered quantitative trait loci (QTL) associated with total biomass accumulation and/or grain production in bioenergy grass species including maize and sorghum. However, very little research has been focused on genes associated with increased biomass conversion efficiency. In this study, Trichoderma viride fungal cellulase hydrolysis activity was measured for lignocellulosic biomass (leaf and stem tissue) obtained from individuals in a F5 recombinant inbred Sorghum bicolor × Sorghum propinquum mapping population. A total of 49 QTLs (20 leaf, 29 stem) were associated with enzymatic conversion efficiency. Interestingly, six high-density QTL regions were identified in which four or more QTLs overlapped. In addition to enzymatic conversion efficiency QTLs, two QTLs were identified for biomass crystallinity index, a trait which has been shown to be inversely correlated with conversion efficiency in bioenergy grasses. The identification of these QTLs provides an important step toward identifying specific genes relevant to increasing conversion efficiency of bioenergy feedstocks. DNA markers linked to these QTLs could be useful in marker-assisted breeding programs aimed at increasing overall bioenergy yields concomitant with selection of high total biomass genotypes.


Asunto(s)
Cruzamientos Genéticos , Genes de Plantas , Sitios de Carácter Cuantitativo , Sorghum/genética , Biomasa , Cruzamiento , Carbohidratos/química , Mapeo Cromosómico/métodos , Ligamiento Genético , Marcadores Genéticos , Genotipo , Fenotipo , Sorghum/química , Sorghum/clasificación , Difracción de Rayos X , Zea mays/genética
10.
Bioinform Adv ; 3(1): vbad039, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37020976

RESUMEN

Summary: Large-scale and whole-cell modeling has multiple challenges, including scalable model building and module communication bottlenecks (e.g. between metabolism, gene expression, signaling, etc.). We previously developed an open-source, scalable format for a large-scale mechanistic model of proliferation and death signaling dynamics, but communication bottlenecks between gene expression and protein biochemistry modules remained. Here, we developed two solutions to communication bottlenecks that speed-up simulation by ∼4-fold for hybrid stochastic-deterministic simulations and by over 100-fold for fully deterministic simulations. Fully deterministic speed-up facilitates model initialization, parameter estimation and sensitivity analysis tasks. Availability and implementation: Source code is freely available at https://github.com/birtwistlelab/SPARCED/releases/tag/v1.3.0 implemented in python, and supported on Linux, Windows and MacOS (via Docker).

11.
BMC Genomics ; 13: 176, 2012 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-22568889

RESUMEN

BACKGROUND: Papaya is a major fruit crop in tropical and subtropical regions worldwide. It is trioecious with three sex forms: male, female, and hermaphrodite. Sex determination is controlled by a pair of nascent sex chromosomes with two slightly different Y chromosomes, Y for male and Yh for hermaphrodite. The sex chromosome genotypes are XY (male), XYh (hermaphrodite), and XX (female). The papaya hermaphrodite-specific Yh chromosome region (HSY) is pericentromeric and heterochromatic. Physical mapping of HSY and its X counterpart is essential for sequencing these regions and uncovering the early events of sex chromosome evolution and to identify the sex determination genes for crop improvement. RESULTS: A reiterate chromosome walking strategy was applied to construct the two physical maps with three bacterial artificial chromosome (BAC) libraries. The HSY physical map consists of 68 overlapped BACs on the minimum tiling path, and covers all four HSY-specific Knobs. One gap remained in the region of Knob 1, the only knob structure shared between HSY and X, due to the lack of HSY-specific sequences. This gap was filled on the physical map of the HSY corresponding region in the X chromosome. The X physical map consists of 44 BACs on the minimum tiling path with one gap remaining in the middle, due to the nature of highly repetitive sequences. This gap was filled on the HSY physical map. The borders of the non-recombining HSY were defined genetically by fine mapping using 1460 F2 individuals. The genetically defined HSY spanned approximately 8.5 Mb, whereas its X counterpart extended about 5.4 Mb including a 900 Kb region containing the Knob 1 shared by the HSY and X. The 8.5 Mb HSY corresponds to 4.5 Mb of its X counterpart, showing 4 Mb (89%) DNA sequence expansion. CONCLUSION: The 89% increase of DNA sequence in HSY indicates rapid expansion of the Yh chromosome after genetic recombination was suppressed 2-3 million years ago. The genetically defined borders coincide with the common BACs on the minimum tiling paths of HSY and X. The minimum tiling paths of HSY and its X counterpart are being used for sequencing these X and Yh-specific regions.


Asunto(s)
Carica/genética , Cromosomas de las Plantas/genética , Mapeo Físico de Cromosoma/métodos , Secuencia de Bases , Cromosomas Artificiales Bacterianos , Cruzamientos Genéticos , Marcadores Genéticos , Repeticiones de Microsatélite/genética , Recombinación Genética/genética
12.
New Phytol ; 193(1): 241-252, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21955124

RESUMEN

• Whole genome duplication events provide a lineage with a large reservoir of genes that can be molded by evolutionary forces into phenotypes that fit alternative environments. A well-studied whole genome duplication, the α-event, occurred in an ancestor of the model plant Arabidopsis thaliana. Retained segments of the α-event have been defined in recent years in the form of duplicate protein coding sequences (α-pairs) and associated conserved noncoding DNA sequences (CNSs). Our aim was to identify any association between CNSs and α-pair co-functionality at the gene expression level. • Here, we tested for correlation between CNS counts and α-pair co-expression and expression intensity across nine expression datasets: aerial tissue, flowers, leaves, roots, rosettes, seedlings, seeds, shoots and whole plants. • We provide evidence for a putative regulatory role of the CNSs. The association of CNSs with α-pair co-expression and expression intensity varied by gene function, subgene position and the presence of transcription factor binding motifs. A range of possible CNS regulatory mechanisms, including intron-mediated enhancement, messenger RNA fold stability and transcriptional regulation, are discussed. • This study provides a framework to understand how CNS motifs are involved in the maintenance of gene expression after a whole genome duplication event.


Asunto(s)
Arabidopsis/genética , Secuencia Conservada/genética , ADN Intergénico/genética , Regiones no Traducidas 5'/genética , Arabidopsis/efectos de los fármacos , Arabidopsis/efectos de la radiación , Secuencia de Bases , Regulación de la Expresión Génica de las Plantas/efectos de los fármacos , Regulación de la Expresión Génica de las Plantas/efectos de la radiación , Genes Duplicados/genética , Intrones/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Reguladores del Crecimiento de las Plantas/farmacología , Termodinámica , Transcripción Genética/efectos de los fármacos , Transcripción Genética/efectos de la radiación , Rayos Ultravioleta
13.
Plant Physiol ; 156(3): 1244-56, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21606319

RESUMEN

One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize (Zea mays) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice (Oryza sativa) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species.


Asunto(s)
Secuencia Conservada/genética , Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes/genética , Oryza/genética , Zea mays/genética , Análisis por Conglomerados , Fenotipo
14.
Nat Commun ; 13(1): 3555, 2022 06 21.
Artículo en Inglés | MEDLINE | ID: mdl-35729113

RESUMEN

Mechanistic models of how single cells respond to different perturbations can help integrate disparate big data sets or predict response to varied drug combinations. However, the construction and simulation of such models have proved challenging. Here, we developed a python-based model creation and simulation pipeline that converts a few structured text files into an SBML standard and is high-performance- and cloud-computing ready. We applied this pipeline to our large-scale, mechanistic pan-cancer signaling model (named SPARCED) and demonstrate it by adding an IFNγ pathway submodel. We then investigated whether a putative crosstalk mechanism could be consistent with experimental observations from the LINCS MCF10A Data Cube that IFNγ acts as an anti-proliferative factor. The analyses suggested this observation can be explained by IFNγ-induced SOCS1 sequestering activated EGF receptors. This work forms a foundational recipe for increased mechanistic model-based data integration on a single-cell level, an important building block for clinically-predictive mechanistic models.


Asunto(s)
Nube Computacional , Programas Informáticos , Proliferación Celular , Simulación por Computador , Transducción de Señal
15.
BMC Genomics ; 12: 194, 2011 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-21496274

RESUMEN

BACKGROUND: We investigate if pooling BAC clones and sequencing the pools can provide for more accurate assembly of genome sequences than the "whole genome shotgun" (WGS) approach. Furthermore, we quantify this accuracy increase. We compare the pooled BAC and WGS approaches using in silico simulations. Standard measures of assembly quality focus on assembly size and fragmentation, which are desirable for large whole genome assemblies. We propose additional measures enabling easy and visual comparison of assembly quality, such as rearrangements and redundant sequence content, relative to the known target sequence. RESULTS: The best assembly quality scores were obtained using 454 coverage of 15× linear and 5× paired (3kb insert size) reads (15L-5P) on Arabidopsis. This regime gave similarly good results on four additional plant genomes of very different GC and repeat contents. BAC pooling improved assembly scores over WGS assembly, coverage and redundancy scores improving the most. CONCLUSIONS: BAC pooling works better than WGS, however, both require a physical map to order the scaffolds. Pool sizes up to 12Mbp work well, suggesting this pooling density to be effective in medium-scale re-sequencing applications such as targeted sequencing of QTL intervals for candidate gene discovery. Assuming the current Roche/454 Titanium sequencing limitations, a 12 Mbp region could be re-sequenced with a full plate of linear reads and a half plate of paired-end reads, yielding 15L-5P coverage after read pre-processing. Our simulation suggests that massively over-sequencing may not improve accuracy. Our scoring measures can be used generally to evaluate and compare results of simulated genome assemblies.


Asunto(s)
Cromosomas Artificiales Bacterianos/genética , Genómica/métodos , Arabidopsis/genética , Emparejamiento Base , Genoma de Planta/genética , Biblioteca Genómica , Genómica/normas , Estándares de Referencia , Análisis de Secuencia de ADN
16.
Plant Physiol ; 154(1): 13-24, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20668062

RESUMEN

Discovering gene sets underlying the expression of a given phenotype is of great importance, as many phenotypes are the result of complex gene-gene interactions. Gene coexpression networks, built using a set of microarray samples as input, can help elucidate tightly coexpressed gene sets (modules) that are mixed with genes of known and unknown function. Functional enrichment analysis of modules further subdivides the coexpressed gene set into cofunctional gene clusters that may coexist in the module with other functionally related gene clusters. In this study, 45 coexpressed gene modules and 76 cofunctional gene clusters were discovered for rice (Oryza sativa) using a global, knowledge-independent paradigm and the combination of two network construction methodologies. Some clusters were enriched for previously characterized mutant phenotypes, providing evidence for specific gene sets (and their annotated molecular functions) that underlie specific phenotypes.


Asunto(s)
Regulación de la Expresión Génica de las Plantas , Redes Reguladoras de Genes/genética , Genes de Plantas/genética , Oryza/genética , Análisis por Conglomerados , Sondas de ADN/metabolismo , Sitios Genéticos/genética , Internet , Mutación/genética , Análisis de Secuencia por Matrices de Oligonucleótidos , Fenotipo
17.
Front Big Data ; 4: 582468, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33748749

RESUMEN

Advanced imaging and DNA sequencing technologies now enable the diverse biology community to routinely generate and analyze terabytes of high resolution biological data. The community is rapidly heading toward the petascale in single investigator laboratory settings. As evidence, the single NCBI SRA central DNA sequence repository contains over 45 petabytes of biological data. Given the geometric growth of this and other genomics repositories, an exabyte of mineable biological data is imminent. The challenges of effectively utilizing these datasets are enormous as they are not only large in the size but also stored in geographically distributed repositories in various repositories such as National Center for Biotechnology Information (NCBI), DNA Data Bank of Japan (DDBJ), European Bioinformatics Institute (EBI), and NASA's GeneLab. In this work, we first systematically point out the data-management challenges of the genomics community. We then introduce Named Data Networking (NDN), a novel but well-researched Internet architecture, is capable of solving these challenges at the network layer. NDN performs all operations such as forwarding requests to data sources, content discovery, access, and retrieval using content names (that are similar to traditional filenames or filepaths) and eliminates the need for a location layer (the IP address) for data management. Utilizing NDN for genomics workflows simplifies data discovery, speeds up data retrieval using in-network caching of popular datasets, and allows the community to create infrastructure that supports operations such as creating federation of content repositories, retrieval from multiple sources, remote data subsetting, and others. Named based operations also streamlines deployment and integration of workflows with various cloud platforms. Our contributions in this work are as follows 1) we enumerate the cyberinfrastructure challenges of the genomics community that NDN can alleviate, and 2) we describe our efforts in applying NDN for a contemporary genomics workflow (GEMmaker) and quantify the improvements. The preliminary evaluation shows a sixfold speed up in data insertion into the workflow. 3) As a pilot, we have used an NDN naming scheme (agreed upon by the community and discussed in Section 4) to publish data from broadly used data repositories including the NCBI SRA. We have loaded the NDN testbed with these pre-processed genomes that can be accessed over NDN and used by anyone interested in those datasets. Finally, we discuss our continued effort in integrating NDN with cloud computing platforms, such as the Pacific Research Platform (PRP). The reader should note that the goal of this paper is to introduce NDN to the genomics community and discuss NDN's properties that can benefit the genomics community. We do not present an extensive performance evaluation of NDN-we are working on extending and evaluating our pilot deployment and will present systematic results in a future work.

18.
G3 (Bethesda) ; 10(9): 2953-2963, 2020 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-32665353

RESUMEN

Bigenic expression relationships are conventionally defined based on metrics such as Pearson or Spearman correlation that cannot typically detect latent, non-linear dependencies or require the relationship to be monotonic. Further, the combination of intrinsic and extrinsic noise as well as embedded relationships between sample sub-populations reduces the probability of extracting biologically relevant edges during the construction of gene co-expression networks (GCNs). In this report, we address these problems via our NetExtractor algorithm. NetExtractor examines all pairwise gene expression profiles first with Gaussian mixture models (GMMs) to identify sample sub-populations followed by mutual information (MI) analysis that is capable of detecting non-linear differential bigenic expression relationships. We applied NetExtractor to brain tissue RNA profiles from the Genotype-Tissue Expression (GTEx) project to obtain a brain tissue specific gene expression relationship network centered on cerebellar and cerebellar hemisphere enriched edges. We leveraged the PsychENCODE pre-frontal cortex (PFC) gene regulatory network (GRN) to construct a cerebellar cortex (cerebellar) GRN associated with transcriptionally active regions in cerebellar tissue. Thus, we demonstrate the utility of our NetExtractor approach to detect biologically relevant and novel non-linear binary gene relationships.


Asunto(s)
Redes Reguladoras de Genes , ARN , Algoritmos , Encéfalo , Cerebelo , Biología Computacional , Perfilación de la Expresión Génica
19.
Patterns (N Y) ; 1(6): 100087, 2020 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-33205131

RESUMEN

We introduce the Transcriptome State Perturbation Generator (TSPG) as a novel deep-learning method to identify changes in genomic expression that occur between tissue states using generative adversarial networks. TSPG learns the transcriptome perturbations from RNA-sequencing data required to shift from a source to a target class. We apply TSPG as an effective method of detecting biologically relevant alternate expression patterns between normal and tumor human tissue samples. We demonstrate that the application of TSPG to expression data obtained from a biopsy sample of a patient's kidney cancer can identify patient-specific differentially expressed genes between their individual tumor sample and a target class of healthy kidney gene expression. By utilizing TSPG in a precision medicine application in which the patient sample is not replicated (i.e., n = 1 ), we present a novel technique of determining significant transcriptional aberrations that can be used to help identify potential targeted therapies.

20.
Sci Rep ; 10(1): 17089, 2020 10 13.
Artículo en Inglés | MEDLINE | ID: mdl-33051491

RESUMEN

The human brain is a complex organ that consists of several regions each with a unique gene expression pattern. Our intent in this study was to construct a gene co-expression network (GCN) for the normal brain using RNA expression profiles from the Genotype-Tissue Expression (GTEx) project. The brain GCN contains gene correlation relationships that are broadly present in the brain or specific to thirteen brain regions, which we later combined into six overarching brain mini-GCNs based on the brain's structure. Using the expression profiles of brain region-specific GCN edges, we determined how well the brain region samples could be discriminated from each other, visually with t-SNE plots or quantitatively with the Gene Oracle deep learning classifier. Next, we tested these gene sets on their relevance to human tumors of brain and non-brain origin. Interestingly, we found that genes in the six brain mini-GCNs showed markedly higher mutation rates in tumors relative to matched sets of random genes. Further, we found that cortex genes subdivided Head and Neck Squamous Cell Carcinoma (HNSC) tumors and Pheochromocytoma and Paraganglioma (PCPG) tumors into distinct groups. The brain GCN and mini-GCNs are useful resources for the classification of brain regions and identification of biomarker genes for brain related phenotypes.


Asunto(s)
Biomarcadores/metabolismo , Encéfalo/metabolismo , Redes Reguladoras de Genes , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/metabolismo , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Marcadores Genéticos , Humanos , Modelos Genéticos , Modelos Neurológicos , Mutación , Redes Neurales de la Computación , Distribución Tisular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA