Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 38(Web Server issue): W109-17, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20494976

RESUMO

High-throughput gene-expression studies result in lists of differentially expressed genes. Most current meta-analyses of these gene lists include searching for significant membership of the translated proteins in various signaling pathways. However, such membership enrichment algorithms do not provide insight into which pathways caused the genes to be differentially expressed in the first place. Here, we present an intuitive approach for discovering upstream signaling pathways responsible for regulating these differentially expressed genes. We identify consistently regulated signature genes specific for signal transduction pathways from a panel of single-pathway perturbation experiments. An algorithm that detects overrepresentation of these signature genes in a gene group of interest is used to infer the signaling pathway responsible for regulation. We expose our novel resource and algorithm through a web server called SPEED: Signaling Pathway Enrichment using Experimental Data sets. SPEED can be freely accessed at http://speed.sys-bio.net/.


Assuntos
Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Transdução de Sinais , Software , Algoritmos , Proteínas Estimuladoras de Ligação a CCAAT/genética , Linhagem Celular Tumoral , Bases de Dados Genéticas , Humanos , Internet , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/metabolismo , Mutação , Fatores de Transcrição/metabolismo
2.
BMC Genomics ; 12: 585, 2011 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-22126435

RESUMO

BACKGROUND: Motivated by the precarious state of the world's coral reefs, there is currently a keen interest in coral transcriptomics. By identifying changes in coral gene expression that are triggered by particular environmental stressors, we can begin to characterize coral stress responses at the molecular level, which should lead to the development of more powerful diagnostic tools for evaluating the health of corals in the field. Furthermore, the identification of genetic variants that are more or less resilient in the face of particular stressors will help us to develop more reliable prognoses for particular coral populations. Toward this end, we performed deep mRNA sequencing of the cauliflower coral, Pocillopora damicornis, a geographically widespread Indo-Pacific species that exhibits a great diversity of colony forms and is able to thrive in habitats subject to a wide range of human impacts. Importantly, P. damicornis is particularly amenable to laboratory culture. We collected specimens from three geographically isolated Hawaiian populations subjected to qualitatively different levels of human impact. We isolated RNA from colony fragments ("nubbins") exposed to four environmental stressors (heat, desiccation, peroxide, and hypo-saline conditions) or control conditions. The RNA was pooled and sequenced using the 454 platform. DESCRIPTION: Both the raw reads (n=1, 116, 551) and the assembled contigs (n=70, 786; mean length=836 nucleotides) were deposited in a new publicly available relational database called PocilloporaBase http://www.PocilloporaBase.org. Using BLASTX, 47.2% of the contigs were found to match a sequence in the NCBI database at an E-value threshold of ≤.001; 93.6% of those contigs with matches in the NCBI database appear to be of metazoan origin and 2.3% bacterial origin, while most of the remaining 4.1% match to other eukaryotes, including algae and amoebae. CONCLUSIONS: P. damicornis now joins the handful of coral species for which extensive transcriptomic data are publicly available. Through PocilloporaBase http://www.PocilloporaBase.org, one can obtain assembled contigs and raw reads and query the data according to a wide assortment of attributes including taxonomic origin, PFAM motif, KEGG pathway, and GO annotation.


Assuntos
Antozoários/genética , Bases de Dados Genéticas , Transcriptoma , Animais , Antozoários/classificação , Filogenia
3.
HGG Adv ; 2(3)2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-34514437

RESUMO

Effective genetic diagnosis requires the correlation of genetic variant data with detailed phenotypic information. However, manual encoding of clinical data into machine-readable forms is laborious and subject to observer bias. Natural language processing (NLP) of electronic health records has great potential to enhance reproducibility at scale but suffers from idiosyncrasies in physician notes and other medical records. We developed methods to optimize NLP outputs for automated diagnosis. We filtered NLP-extracted Human Phenotype Ontology (HPO) terms to more closely resemble manually extracted terms and identified filter parameters across a three-dimensional space for optimal gene prioritization. We then developed a tiered pipeline that reduces manual effort by prioritizing smaller subsets of genes to consider for genetic diagnosis. Our filtering pipeline enabled NLP-based extraction of HPO terms to serve as a sufficient replacement for manual extraction in 92% of prospectively evaluated cases. In 75% of cases, the correct causal gene was ranked higher with our applied filters than without any filters. We describe a framework that can maximize the utility of NLP-based phenotype extraction for gene prioritization and diagnosis. The framework is implemented within a cloud-based modular architecture that can be deployed across health and research institutions.

4.
BMC Bioinformatics ; 10: 364, 2009 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-19874609

RESUMO

BACKGROUND: Efficient analysis of results from mass spectrometry-based proteomics experiments requires access to disparate data types, including native mass spectrometry files, output from algorithms that assign peptide sequence to MS/MS spectra, and annotation for proteins and pathways from various database sources. Moreover, proteomics technologies and experimental methods are not yet standardized; hence a high degree of flexibility is necessary for efficient support of high- and low-throughput data analytic tasks. Development of a desktop environment that is sufficiently robust for deployment in data analytic pipelines, and simultaneously supports customization for programmers and non-programmers alike, has proven to be a significant challenge. RESULTS: We describe multiplierz, a flexible and open-source desktop environment for comprehensive proteomics data analysis. We use this framework to expose a prototype version of our recently proposed common API (mzAPI) designed for direct access to proprietary mass spectrometry files. In addition to routine data analytic tasks, multiplierz supports generation of information rich, portable spreadsheet-based reports. Moreover, multiplierz is designed around a "zero infrastructure" philosophy, meaning that it can be deployed by end users with little or no system administration support. Finally, access to multiplierz functionality is provided via high-level Python scripts, resulting in a fully extensible data analytic environment for rapid development of custom algorithms and deployment of high-throughput data pipelines. CONCLUSION: Collectively, mzAPI and multiplierz facilitate a wide range of data analysis tasks, spanning technology development to biological annotation, for mass spectrometry-based proteomics research.


Assuntos
Biologia Computacional/métodos , Espectrometria de Massas/métodos , Proteoma/análise , Proteômica/métodos , Software , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Internet , Interface Usuário-Computador
5.
Anal Chem ; 80(12): 4606-13, 2008 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-18491922

RESUMO

Proteomics-based analysis of signaling cascades relies on a growing suite of affinity resins and methods aimed at efficient enrichment of phosphorylated peptides from complex biological mixtures. Given the heterogeneity of phosphopeptides and the overlap in chemical properties between phospho- and unmodified peptides, it is likely that the use of multiple resins will provide the best combination of specificity, yield, and coverage for large-scale proteomics studies. Recently titanium and zirconium dioxides have been used successfully for enrichment of phosphopeptides. Here we report the first demonstration that niobium pentoxide (Nb 2O 5) provides for efficient enrichment and recovery ( approximately 50-100%) of phosphopeptides from simple mixtures and facilitates identification of several hundred putative sites of phosphorylation from cell lysate. Comparison of phosphorylated peptides identified from Nb 2O 5 and TiO 2 with sequences in the PhosphoELM database suggests a useful degree of divergence in the selectivity of these metal oxide resins. Collectively our data indicate that Nb 2O 5 provides efficient enrichment for phosphopeptides and offers a complementary approach for large-scale phosphoproteomics studies.


Assuntos
Nióbio , Óxidos , Fosfopeptídeos/análise , Proteômica/métodos , Sequência de Aminoácidos , Caseínas/análise , Caseínas/química , Linhagem Celular Tumoral , Humanos , Dados de Sequência Molecular , Fosfopeptídeos/química , Fosforilação , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz
7.
J Diabetes Sci Technol ; 10(1): 6-18, 2015 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-26685993

RESUMO

BACKGROUND: Application of novel machine learning approaches to electronic health record (EHR) data could provide valuable insights into disease processes. We utilized this approach to build predictive models for progression to prediabetes and type 2 diabetes (T2D). METHODS: Using a novel analytical platform (Reverse Engineering and Forward Simulation [REFS]), we built prediction model ensembles for progression to prediabetes or T2D from an aggregated EHR data sample. REFS relies on a Bayesian scoring algorithm to explore a wide model space, and outputs a distribution of risk estimates from an ensemble of prediction models. We retrospectively followed 24 331 adults for transitions to prediabetes or T2D, 2007-2012. Accuracy of prediction models was assessed using an area under the curve (AUC) statistic, and validated in an independent data set. RESULTS: Our primary ensemble of models accurately predicted progression to T2D (AUC = 0.76), and was validated out of sample (AUC = 0.78). Models of progression to T2D consisted primarily of established risk factors (blood glucose, blood pressure, triglycerides, hypertension, lipid disorders, socioeconomic factors), whereas models of progression to prediabetes included novel factors (high-density lipoprotein, alanine aminotransferase, C-reactive protein, body temperature; AUC = 0.70). CONCLUSIONS: We constructed accurate prediction models from EHR data using a hypothesis-free machine learning approach. Identification of established risk factors for T2D serves as proof of concept for this analytical approach, while novel factors selected by REFS represent emerging areas of T2D research. This methodology has potentially valuable downstream applications to personalized medicine and clinical research.


Assuntos
Diabetes Mellitus Tipo 2 , Progressão da Doença , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Estado Pré-Diabético , Adulto , Área Sob a Curva , Feminino , Humanos , Masculino , Informática Médica/métodos , Curva ROC , Estudos Retrospectivos , Fatores de Risco
8.
Otolaryngol Head Neck Surg ; 150(3): 460-3, 2014 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24367049

RESUMO

Etiologies for many inner ear disorders, including autoimmune inner ear disease, sudden sensorineural hearing loss, and Meniere's disease, remain unknown. Indirect evidence suggests an immune-mediated process involving an allergic or autoimmune mechanism. We examined whether known immunogenic proteins share sequence similarity with inner ear proteins, which may lead to cross-reactivity and detrimental immune activation. Comprehensive bioinformatic analyses of primary sequences of intact and mutated proteins associated with human hearing loss and all proteins known to be expressed in the human inner ear were compared with all immune epitopes in the Immune Epitope Database. The exact match and basic local alignment search tool computational algorithms identified 3036 and 106 unique epitope matches, respectively, the majority of which were infectious epitopes. If validated in future clinical trials, these candidate immune epitopes in the inner ear would be potential novel targets for diagnosis and treatment of some inner ear disorders and the resulting hearing loss.


Assuntos
Doenças Autoimunes/diagnóstico , Orelha Interna/imunologia , Epitopos Imunodominantes/imunologia , Doenças do Labirinto/imunologia , Algoritmos , Doenças Autoimunes/imunologia , Western Blotting , Orelha Interna/metabolismo , Humanos , Doenças do Labirinto/diagnóstico
9.
Cell Stem Cell ; 15(1): 92-101, 2014 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-24813856

RESUMO

Alternative RNA splicing (AS) regulates proteome diversity, including isoform-specific expression of several pluripotency genes. Here, we integrated global gene expression and proteomic analyses and identified a molecular signature suggesting a central role for AS in maintaining human pluripotent stem cell (hPSC) self-renewal. We demonstrate that the splicing factor SFRS2 is an OCT4 target gene required for pluripotency. SFRS2 regulates AS of the methyl-CpG binding protein MBD2, whose isoforms play opposing roles in maintenance of and reprogramming to pluripotency. Although both MDB2a and MBD2c are enriched at the OCT4 and NANOG promoters, MBD2a preferentially interacts with repressive NuRD chromatin remodeling factors and promotes hPSC differentiation, whereas overexpression of MBD2c enhances reprogramming of fibroblasts to pluripotency. The miR-301 and miR-302 families provide additional regulation by targeting SFRS2 and MDB2a. These data suggest that OCT4, SFRS2, and MBD2 participate in a positive feedback loop, regulating proteome diversity in support of hPSC self-renewal and reprogramming.


Assuntos
Processamento Alternativo/fisiologia , Proteínas de Ligação a DNA/metabolismo , Fibroblastos/fisiologia , Proteínas Nucleares/metabolismo , Células-Tronco Pluripotentes/fisiologia , Ribonucleoproteínas/metabolismo , Diferenciação Celular , Sobrevivência Celular , Células Cultivadas , Reprogramação Celular , Proteínas de Ligação a DNA/genética , Retroalimentação Fisiológica , Perfilação da Expressão Gênica , Proteínas de Homeodomínio/metabolismo , Humanos , Complexo Mi-2 de Remodelação de Nucleossomo e Desacetilase/metabolismo , MicroRNAs/genética , MicroRNAs/metabolismo , Proteína Homeobox Nanog , Fator 3 de Transcrição de Octâmero/metabolismo , Ligação Proteica , Isoformas de Proteínas/genética , Proteômica , Fatores de Processamento de Serina-Arginina
10.
PLoS One ; 7(9): e45211, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23028852

RESUMO

Curated gene sets from databases such as KEGG Pathway and Gene Ontology are often used to systematically organize lists of genes or proteins derived from high-throughput data. However, the information content inherent to some relationships between the interrogated gene sets, such as pathway crosstalk, is often underutilized. A gene set network, where nodes representing individual gene sets such as KEGG pathways are connected to indicate a functional dependency, is well suited to visualize and analyze global gene set relationships. Here we introduce a novel gene set network construction algorithm that integrates gene lists derived from high-throughput experiments with curated gene sets to construct co-enrichment gene set networks. Along with previously described co-membership and linkage algorithms, we apply the co-enrichment algorithm to eight gene set collections to construct integrated multi-evidence gene set networks with multiple edge types connecting gene sets. We demonstrate the utility of approach through examples of novel gene set networks such as the chromosome map co-differential expression gene set network. A total of twenty-four gene set networks are exposed via a web tool called MetaNet, where context-specific multi-edge gene set networks are constructed from enriched gene sets within user-defined gene lists. MetaNet is freely available at http://blaispathways.dfci.harvard.edu/metanet/.


Assuntos
Expressão Gênica , Redes Reguladoras de Genes , Software , Algoritmos , Mapeamento Cromossômico , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Humanos , Internet
11.
Artigo em Inglês | MEDLINE | ID: mdl-22231900

RESUMO

Mass spectrometry has become the method of choice for proteome characterization, including multicomponent protein complexes (typically tens to hundreds of proteins) and total protein expression (up to tens of thousands of proteins), in biological samples. Qualitative sequence assignment based on MS/MS spectra is relatively well-defined, while statistical metrics for relative quantification have not completely stabilized. Nonetheless, proteomics studies have progressed to the point whereby various gene-, pathway-, or network-oriented computational frameworks may be used to place mass spectrometry data into biological context. Despite this progress, the dynamic range of protein expression remains a significant hurdle, and impedes comprehensive proteome analysis. Methods designed to enrich specific protein classes have emerged as an effective means to characterize enzymes or other catalytically active proteins that are otherwise difficult to detect in typical discovery mode proteomics experiments. Collectively, these approaches will facilitate identification of biomarkers and pathways relevant to diagnosis and treatment of human disease.


Assuntos
Proteínas/química , Proteômica , Cromatografia Líquida de Alta Pressão , Humanos , Fosfopeptídeos/análise , Mapas de Interação de Proteínas , Proteínas/metabolismo , Proteoma/análise , Espectrometria de Massas em Tandem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA