Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 106
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Development ; 148(6)2021 03 21.
Article in English | MEDLINE | ID: mdl-33653874

ABSTRACT

To gain a deeper understanding of pancreatic ß-cell development, we used iterative weighted gene correlation network analysis to calculate a gene co-expression network (GCN) from 11 temporally and genetically defined murine cell populations. The GCN, which contained 91 distinct modules, was then used to gain three new biological insights. First, we found that the clustered protocadherin genes are differentially expressed during pancreas development. Pcdhγ genes are preferentially expressed in pancreatic endoderm, Pcdhß genes in nascent islets, and Pcdhα genes in mature ß-cells. Second, after extracting sub-networks of transcriptional regulators for each developmental stage, we identified 81 zinc finger protein (ZFP) genes that are preferentially expressed during endocrine specification and ß-cell maturation. Third, we used the GCN to select three ZFPs for further analysis by CRISPR mutagenesis of mice. Zfp800 null mice exhibited early postnatal lethality, and at E18.5 their pancreata exhibited a reduced number of pancreatic endocrine cells, alterations in exocrine cell morphology, and marked changes in expression of genes involved in protein translation, hormone secretion and developmental pathways in the pancreas. Together, our results suggest that developmentally oriented GCNs have utility for gaining new insights into gene regulation during organogenesis.


Subject(s)
Cell Differentiation/genetics , Homeodomain Proteins/genetics , Organogenesis/genetics , Pancreas/growth & development , Animals , Cadherins/genetics , Cell Lineage/genetics , Gene Expression Regulation, Developmental/genetics , Insulin/metabolism , Islets of Langerhans/cytology , Islets of Langerhans/metabolism , Mice , Pancreas/metabolism
2.
Alzheimers Dement ; 20(2): 1123-1136, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37881831

ABSTRACT

INTRODUCTION: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site Alzheimer's Genomics Database (GenomicsDB) is a public knowledge base of Alzheimer's disease (AD) genetic datasets and genomic annotations. METHODS: GenomicsDB uses a custom systems architecture to adopt and enforce rigorous standards that facilitate harmonization of AD-relevant genome-wide association study summary statistics datasets with functional annotations, including over 230 million annotated variants from the AD Sequencing Project. RESULTS: GenomicsDB generates interactive reports compiled from the harmonized datasets and annotations. These reports contextualize AD-risk associations in a broader functional genomic setting and summarize them in the context of functionally annotated genes and variants. DISCUSSION: Created to make AD-genetics knowledge more accessible to AD researchers, the GenomicsDB is designed to guide users unfamiliar with genetic data in not only exploring but also interpreting this ever-growing volume of data. Scalable and interoperable with other genomics resources using data technology standards, the GenomicsDB can serve as a central hub for research and data analysis on AD and related dementias. HIGHLIGHTS: The National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) offers to the public a unique, disease-centric collection of AD-relevant GWAS summary statistics datasets. Interpreting these data is challenging and requires significant bioinformatics expertise to standardize datasets and harmonize them with functional annotations on genome-wide scales. The NIAGADS Alzheimer's GenomicsDB helps overcome these challenges by providing a user-friendly public knowledge base for AD-relevant genetics that shares harmonized, annotated summary statistics datasets from the NIAGADS repository in an interpretable, easily searchable format.


Subject(s)
Alzheimer Disease , United States , Humans , Alzheimer Disease/genetics , Genome-Wide Association Study , National Institute on Aging (U.S.) , Genomics , Databases, Factual , Genetic Predisposition to Disease/genetics
3.
J Biomed Inform ; 112S: 100086, 2020.
Article in English | MEDLINE | ID: mdl-34417005

ABSTRACT

Standardizing clinical information in a semantically rich data model is useful for promoting interoperability and facilitating high quality research. Semantic Web technologies such as Resource Description Framework can be utilized to their full potential when a model accurately reflects the semantics of the clinical situation it describes. To this end, ontologies that abide by sound organizational principles can be used as the building blocks of a semantically rich model for the storage of clinical data. However, it is a challenge to programmatically define such a model and load data from disparate sources. The PennTURBO Semantic Engine is a tool developed at the University of Pennsylvania that transforms concise RDF data into a source-independent, semantically rich model. This system sources classes from an application ontology and specifically defines how instances of those classes may relate to each other. Additionally, the system defines and executes RDF data transformations by launching dynamically generated SPARQL update statements. The Semantic Engine was designed as a generalizable data standardization tool, and is able to work with various data models and incoming data sources. Its human-readable configuration files can easily be shared between institutions, providing the basis for collaboration on a standard data model.

4.
Nucleic Acids Res ; 46(D1): D684-D691, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29106667

ABSTRACT

MicrobiomeDB (http://microbiomeDB.org) is a data discovery and analysis platform that empowers researchers to fully leverage experimental variables to interrogate microbiome datasets. MicrobiomeDB was developed in collaboration with the Eukaryotic Pathogens Bioinformatics Resource Center (http://EuPathDB.org) and leverages the infrastructure and user interface of EuPathDB, which allows users to construct in silico experiments using an intuitive graphical 'strategy' approach. The current release of the database integrates microbial census data with sample details for nearly 14 000 samples originating from human, animal and environmental sources, including over 9000 samples from healthy human subjects in the Human Microbiome Project (http://portal.ihmpdcc.org/). Query results can be statistically analyzed and graphically visualized via interactive web applications launched directly in the browser, providing insight into microbial community diversity and allowing users to identify taxa associated with any experimental covariate.


Subject(s)
Data Mining/methods , Databases, Genetic , Microbiota , Systems Biology , Animals , Computer Simulation , Datasets as Topic , Environmental Microbiology , Genetic Variation , Humans , Internet , Mobile Applications , User-Computer Interface , Workflow
5.
Nucleic Acids Res ; 45(D1): D581-D591, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27903906

ABSTRACT

The Eukaryotic Pathogen Genomics Database Resource (EuPathDB, http://eupathdb.org) is a collection of databases covering 170+ eukaryotic pathogens (protists & fungi), along with relevant free-living and non-pathogenic species, and select pathogen hosts. To facilitate the discovery of meaningful biological relationships, the databases couple preconfigured searches with visualization and analysis tools for comprehensive data mining via intuitive graphical interfaces and APIs. All data are analyzed with the same workflows, including creation of gene orthology profiles, so data are easily compared across data sets, data types and organisms. EuPathDB is updated with numerous new analysis tools, features, data sets and data types. New tools include GO, metabolic pathway and word enrichment analyses plus an online workspace for analysis of personal, non-public, large-scale data. Expanded data content is mostly genomic and functional genomic data while new data types include protein microarray, metabolic pathways, compounds, quantitative proteomics, copy number variation, and polysomal transcriptomics. New features include consistent categorization of searches, data sets and genome browser tracks; redesigned gene pages; effective integration of alternative transcripts; and a EuPathDB Galaxy instance for private analyses of a user's data. Forthcoming upgrades include user workspaces for private integration of data with existing EuPathDB data and improved integration and presentation of host-pathogen interactions.


Subject(s)
Databases, Genetic , Eukaryota , Genomics/methods , Host-Pathogen Interactions/genetics , Metagenome , Metagenomics/methods , Software , Computational Biology/methods , DNA Copy Number Variations , Gene Expression Profiling , Proteomics , Web Browser
6.
Genes Dev ; 24(10): 1035-44, 2010 May 15.
Article in English | MEDLINE | ID: mdl-20478996

ABSTRACT

The transcriptional mechanisms by which temporary exposure to developmental signals instigates adipocyte differentiation are unknown. During early adipogenesis, we find transient enrichment of the glucocorticoid receptor (GR), CCAAT/enhancer-binding protein beta (CEBPbeta), p300, mediator subunit 1, and histone H3 acetylation near genes involved in cell proliferation, development, and differentiation, including the gene encoding the master regulator of adipocyte differentiation, peroxisome proliferator-activated receptor gamma2 (PPARgamma2). Occupancy and enhancer function are triggered by adipogenic signals, and diminish upon their removal. GR, which is important for adipogenesis but need not be active in the mature adipocyte, functions transiently with other enhancer proteins to propagate a new program of gene expression that includes induction of PPARgamma2, thereby providing a memory of the earlier adipogenic signal. Thus, the conversion of preadipocyte to adipocyte involves the formation of an epigenomic transition state that is not observed in cells at the beginning or end of the differentiation process.


Subject(s)
Adipogenesis/physiology , Epigenesis, Genetic , Signal Transduction , Acetylation , Animals , CCAAT-Enhancer-Binding Protein-beta/metabolism , Cell Line , Histones/metabolism , Mice , Peroxisome Proliferator-Activated Receptors/metabolism , Receptors, Glucocorticoid/metabolism
7.
Development ; 141(15): 2939-49, 2014 Aug.
Article in English | MEDLINE | ID: mdl-25053427

ABSTRACT

Insulinoma associated 1 (Insm1) plays an important role in regulating the development of cells in the central and peripheral nervous systems, olfactory epithelium and endocrine pancreas. To better define the role of Insm1 in pancreatic endocrine cell development we generated mice with an Insm1(GFPCre) reporter allele and used them to study Insm1-expressing and null populations. Endocrine progenitor cells lacking Insm1 were less differentiated and exhibited broad defects in hormone production, cell proliferation and cell migration. Embryos lacking Insm1 contained greater amounts of a non-coding Neurog3 mRNA splice variant and had fewer Neurog3/Insm1 co-expressing progenitor cells, suggesting that Insm1 positively regulates Neurog3. Moreover, endocrine progenitor cells that express either high or low levels of Pdx1, and thus may be biased towards the formation of specific cell lineages, exhibited cell type-specific differences in the genes regulated by Insm1. Analysis of the function of Ripply3, an Insm1-regulated gene enriched in the Pdx1-high cell population, revealed that it negatively regulates the proliferation of early endocrine cells. Taken together, these findings indicate that in developing pancreatic endocrine cells Insm1 promotes the transition from a ductal progenitor to a committed endocrine cell by repressing a progenitor cell program and activating genes essential for RNA splicing, cell migration, controlled cellular proliferation, vasculogenesis, extracellular matrix and hormone secretion.


Subject(s)
Basic Helix-Loop-Helix Transcription Factors/metabolism , DNA-Binding Proteins/physiology , Endocrine Cells/cytology , Gene Expression Regulation, Developmental , Nerve Tissue Proteins/metabolism , Repressor Proteins/metabolism , Transcription Factors/physiology , Alleles , Alternative Splicing , Animals , Cell Differentiation , Cell Lineage , Cell Movement , Cell Proliferation , Cell Separation , Extracellular Matrix/metabolism , Flow Cytometry , Gene Regulatory Networks , Genes, Reporter , Green Fluorescent Proteins/metabolism , Mice , Mice, Knockout , Pancreas/embryology , RNA/metabolism , RNA Splicing , Stem Cells/cytology , Time Factors , Transcription, Genetic
8.
BMC Genomics ; 16: 506, 2015 Jul 07.
Article in English | MEDLINE | ID: mdl-26148682

ABSTRACT

BACKGROUND: Atherosclerosis is a heterogeneously distributed disease of arteries in which the endothelium plays an important central role. Spatial transcriptome profiling of endothelium in pre-lesional arteries has demonstrated differential phenotypes primed for athero-susceptibility at hemodynamic sites associated with disturbed blood flow. DNA methylation is a powerful epigenetic regulator of endothelial transcription recently associated with flow characteristics. We investigated differential DNA methylation in flow region-specific aortic endothelial cells in vivo in adult domestic male and female swine. RESULTS: Genome-wide DNA methylation was profiled in endothelial cells (EC) isolated from two robust locations of differing patho-susceptibility:--an athero-susceptible site located at the inner curvature of the aortic arch (AA) and an athero-protected region in the descending thoracic (DT) aorta. Complete methylated DNA immunoprecipitation sequencing (MeDIP-seq) identified over 5500 endothelial differentially methylated regions (DMRs). DMR density was significantly enriched in exons and 5'UTR sequences of annotated genes, 60 of which are linked to cardiovascular disease. The set of DMR-associated genes was enriched in transcriptional regulation, pattern specification HOX loci, oxidative stress and the ER stress adaptive pathway, all categories linked to athero-susceptible endothelium. Examination of the relationship between DMR and mRNA in HOXA genes demonstrated a significant inverse relationship between CpG island promoter methylation and gene expression. Methylation-specific PCR (MSP) confirmed differential CpG methylation of HOXA genes, the ER stress gene ATF4, inflammatory regulator microRNA-10a and ARHGAP25 that encodes a negative regulator of Rho GTPases involved in cytoskeleton remodeling. Gender-specific DMRs associated with ciliogenesis that may be linked to defects in cilia development were also identified in AA DMRs. CONCLUSIONS: An endothelial methylome analysis identifies epigenetic DMR characteristics associated with transcriptional regulation in regions of atherosusceptibility in swine aorta in vivo. The data represent the first methylome blueprint for spatio-temporal analyses of lesion susceptibility predisposing to endothelial dysfunction in complex flow environments in vivo.


Subject(s)
Aorta/metabolism , DNA Methylation/genetics , Endothelium, Vascular/metabolism , Transcriptome/genetics , Animals , Atherosclerosis/genetics , CpG Islands/genetics , Endothelial Cells/metabolism , Female , Gene Expression Profiling/methods , Gene Expression Regulation/genetics , Male , Phenotype , Promoter Regions, Genetic/genetics , RNA, Messenger/genetics , Spatio-Temporal Analysis , Swine
9.
Bioinformatics ; 30(9): 1340-2, 2014 May 01.
Article in English | MEDLINE | ID: mdl-24413522

ABSTRACT

Biomedical ontologies are often very large and complex. Only a subset of the ontology may be needed for a specified application or community. For ontology end users, it is desirable to have community-based labels rather than the labels generated by ontology developers. Ontodog is a web-based system that can generate an ontology subset based on Excel input, and support generation of an ontology community view, which is defined as the whole or a subset of the source ontology with user-specified annotations including user-preferred labels. Ontodog allows users to easily generate community views with minimal ontology knowledge and no programming skills or installation required. Currently >100 ontologies including all OBO Foundry ontologies are available to generate the views based on user needs. We demonstrate the application of Ontodog for the generation of community views using the Ontology for Biomedical Investigations as the source ontology.


Subject(s)
Biological Ontologies , Internet , Software
10.
Blood ; 121(6): e5-e13, 2013 Feb 07.
Article in English | MEDLINE | ID: mdl-23243273

ABSTRACT

Erythroid ontogeny is characterized by overlapping waves of primitive and definitive erythroid lineages that share many morphologic features during terminal maturation but have marked differences in cell size and globin expression. In the present study, we compared global gene expression in primitive, fetal definitive, and adult definitive erythroid cells at morphologically equivalent stages of maturation purified from embryonic, fetal, and adult mice. Surprisingly, most transcriptional complexity in erythroid precursors is already present by the proerythroblast stage. Transcript levels are markedly modulated during terminal erythroid maturation, but housekeeping genes are not preferentially lost. Although primitive and definitive erythroid lineages share a large set of nonhousekeeping genes, annotation of lineage-restricted genes shows that alternate gene usage occurs within shared functional categories, as exemplified by the selective expression of aquaporins 3 and 8 in primitive erythroblasts and aquaporins 1 and 9 in adult definitive erythroblasts. Consistent with the known functions of Aqp3 and Aqp8 as H2O2 transporters, primitive, but not definitive, erythroblasts preferentially accumulate reactive oxygen species after exogenous H2O2 exposure. We have created a user-friendly Web site (http://www.cbil.upenn.edu/ErythronDB) to make these global expression data readily accessible and amenable to complex search strategies by the scientific community.


Subject(s)
Erythroid Cells/metabolism , Erythropoiesis/genetics , Gene Expression Profiling , Gene Expression Regulation, Developmental , Animals , Aquaporin 1/genetics , Aquaporin 3/genetics , Aquaporins/genetics , Cell Lineage/genetics , Cells, Cultured , Erythroblasts/metabolism , Erythrocytes/metabolism , Female , Hematopoietic System/cytology , Hematopoietic System/embryology , Hematopoietic System/growth & development , Mice , Mice, Inbred ICR , Reactive Oxygen Species/metabolism , Reverse Transcriptase Polymerase Chain Reaction , Time Factors
11.
Nucleic Acids Res ; 41(Database issue): D684-91, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23175615

ABSTRACT

EuPathDB (http://eupathdb.org) resources include 11 databases supporting eukaryotic pathogen genomic and functional genomic data, isolate data and phylogenomics. EuPathDB resources are built using the same infrastructure and provide a sophisticated search strategy system enabling complex interrogations of underlying data. Recent advances in EuPathDB resources include the design and implementation of a new data loading workflow, a new database supporting Piroplasmida (i.e. Babesia and Theileria), the addition of large amounts of new data and data types and the incorporation of new analysis tools. New data include genome sequences and annotation, strand-specific RNA-seq data, splice junction predictions (based on RNA-seq), phosphoproteomic data, high-throughput phenotyping data, single nucleotide polymorphism data based on high-throughput sequencing (HTS) and expression quantitative trait loci data. New analysis tools enable users to search for DNA motifs and define genes based on their genomic colocation, view results from searches graphically (i.e. genes mapped to chromosomes or isolates displayed on a map) and analyze data from columns in result tables (word cloud and histogram summaries of column content). The manuscript herein describes updates to EuPathDB since the previous report published in NAR in 2010.


Subject(s)
Databases, Genetic , Parasites/genetics , Animals , Genomics , Internet , Molecular Sequence Annotation , Phenotype , Piroplasmida/genetics , Polymorphism, Single Nucleotide , Proteomics , Quantitative Trait Loci , RNA Splice Sites , Sequence Analysis, RNA , Software
12.
Nature ; 455(7214): 757-63, 2008 Oct 09.
Article in English | MEDLINE | ID: mdl-18843361

ABSTRACT

The human malaria parasite Plasmodium vivax is responsible for 25-40% of the approximately 515 million annual cases of malaria worldwide. Although seldom fatal, the parasite elicits severe and incapacitating clinical symptoms and often causes relapses months after a primary infection has cleared. Despite its importance as a major human pathogen, P. vivax is little studied because it cannot be propagated continuously in the laboratory except in non-human primates. We sequenced the genome of P. vivax to shed light on its distinctive biological features, and as a means to drive development of new drugs and vaccines. Here we describe the synteny and isochore structure of P. vivax chromosomes, and show that the parasite resembles other malaria parasites in gene content and metabolic potential, but possesses novel gene families and potential alternative invasion pathways not recognized previously. Completion of the P. vivax genome provides the scientific community with a valuable resource that can be used to advance investigation into this neglected species.


Subject(s)
Genome, Protozoan/genetics , Genomics , Malaria, Vivax/parasitology , Plasmodium vivax/genetics , Amino Acid Motifs , Animals , Artemisinins/metabolism , Artemisinins/pharmacology , Atovaquone/metabolism , Atovaquone/pharmacology , Cell Nucleus/genetics , Chromosomes/genetics , Conserved Sequence/genetics , Erythrocytes/parasitology , Evolution, Molecular , Haplorhini/parasitology , Humans , Isochores/genetics , Ligands , Malaria, Vivax/metabolism , Multigene Family , Plasmodium vivax/drug effects , Plasmodium vivax/pathogenicity , Plasmodium vivax/physiology , Sequence Analysis, DNA , Species Specificity , Synteny/genetics
13.
Stem Cells ; 30(10): 2297-308, 2012 Oct.
Article in English | MEDLINE | ID: mdl-22865702

ABSTRACT

Sox17 is essential for both endoderm development and fetal hematopoietic stem cell (HSC) maintenance. While endoderm-derived organs are well known to originate from Sox17-expressing cells, it is less certain whether fetal HSCs also originate from Sox17-expressing cells. By generating a Sox17(GFPCre) allele and using it to assess the fate of Sox17-expressing cells during embryogenesis, we confirmed that both endodermal and a part of definitive hematopoietic cells are derived from Sox17-positive cells. Prior to E9.5, the expression of Sox17 is restricted to the endoderm lineage. However, at E9.5 Sox17 is expressed in the endothelial cells (ECs) at the para-aortic splanchnopleural region that contribute to the formation of HSCs at a later stage. The identification of two distinct progenitor cell populations that express Sox17 at E9.5 was confirmed using fluorescence-activated cell sorting together with RNA-Seq to determine the gene expression profiles of the two cell populations. Interestingly, this analysis revealed differences in the RNA processing of the Sox17 mRNA during embryogenesis. Taken together, these results indicate that Sox17 is expressed in progenitor cells derived from two different germ layers, further demonstrating the complex expression pattern of this gene and suggesting caution when using Sox17 as a lineage-specific marker.


Subject(s)
Fetal Stem Cells/metabolism , Gene Expression Regulation, Developmental , HMGB Proteins/genetics , Hematopoietic Stem Cells/metabolism , SOXF Transcription Factors/genetics , Animals , Cell Differentiation , Cell Lineage , Embryo, Mammalian , Embryonic Development , Endoderm/cytology , Endoderm/metabolism , Fetal Stem Cells/cytology , Flow Cytometry , Green Fluorescent Proteins/genetics , HMGB Proteins/metabolism , Hematopoietic Stem Cells/cytology , Mice , Mice, Transgenic , RNA, Messenger/biosynthesis , SOXF Transcription Factors/metabolism
14.
Nucleic Acids Res ; 39(Database issue): D612-9, 2011 Jan.
Article in English | MEDLINE | ID: mdl-20974635

ABSTRACT

AmoebaDB (http://AmoebaDB.org) and MicrosporidiaDB (http://MicrosporidiaDB.org) are new functional genomic databases serving the amoebozoa and microsporidia research communities, respectively. AmoebaDB contains the genomes of three Entamoeba species (E. dispar, E. invadens and E. histolityca) and microarray expression data for E. histolytica. MicrosporidiaDB contains the genomes of Encephalitozoon cuniculi, E. intestinalis and E. bieneusi. The databases belong to the National Institute of Allergy and Infectious Diseases (NIAID) funded EuPathDB (http://EuPathDB.org) Bioinformatics Resource Center family of integrated databases and assume the same architectural and graphical design as other EuPathDB resources such as PlasmoDB and TriTrypDB. Importantly they utilize the graphical strategy builder that affords a database user the ability to ask complex multi-data-type questions with relative ease and versatility. Genomic scale data can be queried based on BLAST searches, annotation keywords and gene ID searches, GO terms, sequence motifs, protein characteristics, phylogenetic relationships and functional data such as transcript (microarray and EST evidence) and protein expression data. Search strategies can be saved within a user's profile for future retrieval and may also be shared with other researchers using a unique strategy web address.


Subject(s)
Databases, Genetic , Encephalitozoon/genetics , Entamoeba/genetics , Genome, Fungal , Genome, Protozoan , Genomics
15.
Nat Genet ; 32 Suppl: 469-73, 2002 Dec.
Article in English | MEDLINE | ID: mdl-12454640

ABSTRACT

A single microarray can provide information on the expression of tens of thousands of genes. The amount of information generated by a microarray-based experiment is sufficiently large that no single study can be expected to mine each nugget of scientific information. As a consequence, the scale and complexity of microarray experiments require that computer software programs do much of the data processing, storage, visualization, analysis and transfer. The adoption of common standards and ontologies for the management and sharing of microarray data is essential and will provide immediate benefit to the research community.


Subject(s)
Database Management Systems , Databases, Genetic/standards , Gene Expression Profiling/standards , Oligonucleotide Array Sequence Analysis/standards , Electronic Data Processing , Gene Expression Profiling/methods , Humans , Information Storage and Retrieval , Internet , Models, Biological , Oligonucleotide Array Sequence Analysis/methods , Programming Languages , Quality Control , Sequence Analysis, DNA , Software
16.
Bioinformatics ; 27(18): 2518-28, 2011 Sep 15.
Article in English | MEDLINE | ID: mdl-21775302

ABSTRACT

MOTIVATION: A critical task in high-throughput sequencing is aligning millions of short reads to a reference genome. Alignment is especially complicated for RNA sequencing (RNA-Seq) because of RNA splicing. A number of RNA-Seq algorithms are available, and claim to align reads with high accuracy and efficiency while detecting splice junctions. RNA-Seq data are discrete in nature; therefore, with reasonable gene models and comparative metrics RNA-Seq data can be simulated to sufficient accuracy to enable meaningful benchmarking of alignment algorithms. The exercise to rigorously compare all viable published RNA-Seq algorithms has not been performed previously. RESULTS: We developed an RNA-Seq simulator that models the main impediments to RNA alignment, including alternative splicing, insertions, deletions, substitutions, sequencing errors and intron signal. We used this simulator to measure the accuracy and robustness of available algorithms at the base and junction levels. Additionally, we used reverse transcription-polymerase chain reaction (RT-PCR) and Sanger sequencing to validate the ability of the algorithms to detect novel transcript features such as novel exons and alternative splicing in RNA-Seq data from mouse retina. A pipeline based on BLAT was developed to explore the performance of established tools for this problem, and to compare it to the recently developed methods. This pipeline, the RNA-Seq Unified Mapper (RUM), performs comparably to the best current aligners and provides an advantageous combination of accuracy, speed and usability. AVAILABILITY: The RUM pipeline is distributed via the Amazon Cloud and for computing clusters using the Sun Grid Engine (http://cbil.upenn.edu/RUM). CONTACT: ggrant@pcbi.upenn.edu; epierce@mail.med.upenn.edu SUPPLEMENTARY INFORMATION: The RNA-Seq sequence reads described in the article are deposited at GEO, accession GSE26248.


Subject(s)
Sequence Analysis, RNA/methods , Algorithms , Animals , Base Sequence , Benchmarking , Cluster Analysis , Exons , Gene Library , Genome , High-Throughput Nucleotide Sequencing , Mice , Models, Genetic , Molecular Sequence Data , RNA/genetics , RNA Splicing , Sequence Alignment , Software
17.
Nucleic Acids Res ; 38(Database issue): D415-9, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19914931

ABSTRACT

EuPathDB (http://EuPathDB.org; formerly ApiDB) is an integrated database covering the eukaryotic pathogens of the genera Cryptosporidium, Giardia, Leishmania, Neospora, Plasmodium, Toxoplasma, Trichomonas and Trypanosoma. While each of these groups is supported by a taxon-specific database built upon the same infrastructure, the EuPathDB portal offers an entry point to all these resources, and the opportunity to leverage orthology for searches across genera. The most recent release of EuPathDB includes updates and changes affecting data content, infrastructure and the user interface, improving data access and enhancing the user experience. EuPathDB currently supports more than 80 searches and the recently-implemented 'search strategy' system enables users to construct complex multi-step searches via a graphical interface. Search results are dynamically displayed as the strategy is constructed or modified, and can be downloaded, saved, revised, or shared with other database users.


Subject(s)
Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Protozoan Infections/parasitology , Protozoan Proteins/genetics , Animals , Computational Biology/trends , Databases, Protein , Genome, Protozoan , Humans , Information Storage and Retrieval/methods , Internet , Protein Structure, Tertiary , Protozoan Infections/genetics , Software
18.
Nucleic Acids Res ; 38(Database issue): D457-62, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19843604

ABSTRACT

TriTrypDB (http://tritrypdb.org) is an integrated database providing access to genome-scale datasets for kinetoplastid parasites, and supporting a variety of complex queries driven by research and development needs. TriTrypDB is a collaborative project, utilizing the GUS/WDK computational infrastructure developed by the Eukaryotic Pathogen Bioinformatics Resource Center (EuPathDB.org) to integrate genome annotation and analyses from GeneDB and elsewhere with a wide variety of functional genomics datasets made available by members of the global research community, often pre-publication. Currently, TriTrypDB integrates datasets from Leishmania braziliensis, L. infantum, L. major, L. tarentolae, Trypanosoma brucei and T. cruzi. Users may examine individual genes or chromosomal spans in their genomic context, including syntenic alignments with other kinetoplastid organisms. Data within TriTrypDB can be interrogated utilizing a sophisticated search strategy system that enables a user to construct complex queries combining multiple data types. All search strategies are stored, allowing future access and integrated searches. 'User Comments' may be added to any gene page, enhancing available annotation; such comments become immediately searchable via the text search, and are forwarded to curators for incorporation into the reference annotation when appropriate.


Subject(s)
Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Leishmania/genetics , Trypanosoma/genetics , Animals , Computational Biology/trends , Databases, Protein , Genome, Protozoan , Information Storage and Retrieval/methods , Internet , Protein Structure, Tertiary , Protozoan Proteins/genetics , Software , User-Computer Interface
19.
Bioinformatics ; 26(19): 2470-1, 2010 Oct 01.
Article in English | MEDLINE | ID: mdl-20733062

ABSTRACT

UNLABELLED: Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis. AVAILABILITY AND IMPLEMENTATION: Annotare is available from http://code.google.com/p/annotare/ under the terms of the open-source MIT License (http://www.opensource.org/licenses/mit-license.php). It has been tested on both Mac and Windows.


Subject(s)
Gene Expression Profiling , Oligonucleotide Array Sequence Analysis , Software , Computational Biology/methods , Databases, Factual , Molecular Sequence Annotation , User-Computer Interface
20.
Circ Res ; 105(5): 453-61, 2009 Aug 28.
Article in English | MEDLINE | ID: mdl-19661457

ABSTRACT

RATIONALE: Endothelial function and dysfunction are central to the focal origin and regional development of atherosclerosis; however, an in vivo endothelial phenotypic footprint of susceptibility to atherosclerosis preceding pathological change remains elusive. OBJECTIVE: To conduct a comparative multi-site genomics study of arterial endothelial phenotype in atherosusceptible and atheroprotected regions. METHODS AND RESULTS: Transcript profiles of freshly isolated endothelial cells from 7 discrete arterial regions in normal swine were analyzed to determine the steady state in vivo endothelial phenotypes in regions of varying susceptibilities to atherosclerosis. The most abundant common feature of the endothelium of all atherosusceptible regions was the upregulation of genes associated with endoplasmic reticulum (ER) stress. The unfolded protein response pathway, induced by ER stress, was therefore investigated in detail in endothelium of the atherosusceptible aortic arch and was found to be partially activated. ER transmembrane signal transducers IRE1alpha and ATF6alpha and their downstream effectors, but not PERK, were activated concomitant with a higher transcript expression of protein folding enzymes and chaperones, indicative of ER stress in vivo. CONCLUSIONS: The findings demonstrate the prevalence of chronic endothelial ER stress and activated unfolded protein response in vivo at atherosusceptible arterial sites. We propose that chronic localized biological stress is linked to spatial susceptibility of the endothelium to the initiation of atherosclerosis.


Subject(s)
Atherosclerosis/genetics , Endoplasmic Reticulum/chemistry , Endothelium, Vascular/chemistry , Stress, Physiological/genetics , Animals , Aorta/chemistry , Atherosclerosis/metabolism , Carotid Arteries/chemistry , Gene Expression Profiling/methods , Gene Regulatory Networks , Genetic Predisposition to Disease , Oligonucleotide Array Sequence Analysis , Phenotype , Protein Biosynthesis/genetics , Protein Folding , RNA, Messenger/analysis , Signal Transduction/genetics , Swine
SELECTION OF CITATIONS
SEARCH DETAIL