Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
Pac Symp Biocomput ; 29: 81-95, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38160271

RESUMEN

In the intricate landscape of healthcare analytics, effective feature selection is a prerequisite for generating robust predictive models, especially given the common challenges of sample sizes and potential biases. Zoish uniquely addresses these issues by employing Shapley additive values-an idea rooted in cooperative game theory-to enable both transparent and automated feature selection. Unlike existing tools, Zoish is versatile, designed to seamlessly integrate with an array of machine learning libraries including scikit-learn, XGBoost, CatBoost, and imbalanced-learn.The distinct advantage of Zoish lies in its dual algorithmic approach for calculating Shapley values, allowing it to efficiently manage both large and small datasets. This adaptability renders it exceptionally suitable for a wide spectrum of healthcare-related tasks. The tool also places a strong emphasis on interpretability, providing comprehensive visualizations for analyzed features. Its customizable settings offer users fine-grained control over feature selection, thus optimizing for specific predictive objectives.This manuscript elucidates the mathematical framework underpinning Zoish and how it uniquely combines local and global feature selection into a single, streamlined process. To validate Zoish's efficiency and adaptability, we present case studies in breast cancer prediction and Montreal Cognitive Assessment (MoCA) prediction in Parkinson's disease, along with evaluations on 300 synthetic datasets. These applications underscore Zoish's unparalleled performance in diverse healthcare contexts and against its counterparts.


Asunto(s)
Neoplasias de la Mama , Biología Computacional , Humanos , Femenino , Teoría del Juego , Aprendizaje Automático , Atención a la Salud
2.
Patterns (N Y) ; 4(8): 100800, 2023 Aug 11.
Artículo en Inglés | MEDLINE | ID: mdl-37602209

RESUMEN

We have developed a machine learning (ML) approach using Gaussian process (GP)-based spatial covariance (SCV) to track the impact of spatial-temporal mutational events driving host-pathogen balance in biology. We show how SCV can be applied to understanding the response of evolving covariant relationships linking the variant pattern of virus spread to pathology for the entire SARS-CoV-2 genome on a daily basis. We show that GP-based SCV relationships in conjunction with genome-wide co-occurrence analysis provides an early warning anomaly detection (EWAD) system for the emergence of variants of concern (VOCs). EWAD can anticipate changes in the pattern of performance of spread and pathology weeks in advance, identifying signatures destined to become VOCs. GP-based analyses of variation across entire viral genomes can be used to monitor micro and macro features responsible for host-pathogen balance. The versatility of GP-based SCV defines starting point for understanding nature's evolutionary path to complexity through natural selection.

3.
Elife ; 112022 09 23.
Artículo en Inglés | MEDLINE | ID: mdl-36148981

RESUMEN

Genotype imputation is a foundational tool for population genetics. Standard statistical imputation approaches rely on the co-location of large whole-genome sequencing-based reference panels, powerful computing environments, and potentially sensitive genetic study data. This results in computational resource and privacy-risk barriers to access to cutting-edge imputation techniques. Moreover, the accuracy of current statistical approaches is known to degrade in regions of low and complex linkage disequilibrium. Artificial neural network-based imputation approaches may overcome these limitations by encoding complex genotype relationships in easily portable inference models. Here, we demonstrate an autoencoder-based approach for genotype imputation, using a large, commonly used reference panel, and spanning the entirety of human chromosome 22. Our autoencoder-based genotype imputation strategy achieved superior imputation accuracy across the allele-frequency spectrum and across genomes of diverse ancestry, while delivering at least fourfold faster inference run time relative to standard imputation tools.


Asunto(s)
Genética de Población , Polimorfismo de Nucleótido Simple , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Desequilibrio de Ligamiento
4.
J Clin Endocrinol Metab ; 107(11): 3100-3110, 2022 11 23.
Artículo en Inglés | MEDLINE | ID: mdl-36017587

RESUMEN

CONTEXT: Aberrant biosynthesis and secretion of the insulin precursor proinsulin occurs in both type I and type II diabetes. Inflammatory cytokines are implicated in pancreatic islet stress and dysfunction in both forms of diabetes, but the mechanisms remain unclear. OBJECTIVE: We sought to determine the effect of the diabetes-associated cytokines on proinsulin folding, trafficking, secretion, and ß-cell function. METHODS: Human islets were treated with interleukin-1ß and interferon-γ for 48 hours, followed by analysis of interleukin-6, nitrite, proinsulin and insulin release, RNA sequencing, and unbiased profiling of the proinsulin interactome by affinity purification-mass spectrometry. RESULTS: Cytokine treatment induced secretion of interleukin-6, nitrites, and insulin, as well as aberrant release of proinsulin. RNA sequencing showed that cytokines upregulated genes involved in endoplasmic reticulum stress, and, consistent with this, affinity purification-mass spectrometry revealed cytokine induced proinsulin binding to multiple endoplasmic reticulum chaperones and oxidoreductases. Moreover, increased binding to the chaperone immunoglobulin binding protein was required to maintain proper proinsulin folding in the inflammatory environment. Cytokines also regulated novel interactions between proinsulin and type 1 and type 2 diabetes genome-wide association studies candidate proteins not previously known to interact with proinsulin (eg, Ataxin-2). Finally, cytokines induced proinsulin interactions with a cluster of microtubule motor proteins and chemical destabilization of microtubules with Nocodazole exacerbated cytokine induced proinsulin secretion. CONCLUSION: Together, the data shed new light on mechanisms by which diabetes-associated cytokines dysregulate ß-cell function. For the first time, we show that even short-term exposure to an inflammatory environment reshapes proinsulin interactions with critical chaperones and regulators of the secretory pathway.


Asunto(s)
Diabetes Mellitus Tipo 2 , Células Secretoras de Insulina , Islotes Pancreáticos , Humanos , Proinsulina/metabolismo , Diabetes Mellitus Tipo 2/metabolismo , Citocinas/metabolismo , Interleucina-6/metabolismo , Estudio de Asociación del Genoma Completo , Insulina/metabolismo , Islotes Pancreáticos/metabolismo , Células Secretoras de Insulina/metabolismo
5.
Diabetes ; 69(8): 1723-1734, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32457219

RESUMEN

The ß-cell protein synthetic machinery is dedicated to the production of mature insulin, which requires the proper folding and trafficking of its precursor, proinsulin. The complete network of proteins that mediate proinsulin folding and advancement through the secretory pathway, however, remains poorly defined. Here we used affinity purification and mass spectrometry to identify, for the first time, the proinsulin biosynthetic interaction network in human islets. Stringent analysis established a central node of proinsulin interactions with endoplasmic reticulum (ER) folding factors, including chaperones and oxidoreductases, that is remarkably conserved in both sexes and across three ethnicities. The ER-localized peroxiredoxin PRDX4 was identified as a prominent proinsulin-interacting protein. In ß-cells, gene silencing of PRDX4 rendered proinsulin susceptible to misfolding, particularly in response to oxidative stress, while exogenous PRDX4 improved proinsulin folding. Moreover, proinsulin misfolding induced by oxidative stress or high glucose was accompanied by sulfonylation of PRDX4, a modification known to inactivate peroxiredoxins. Notably, islets from patients with type 2 diabetes (T2D) exhibited significantly higher levels of sulfonylated PRDX4 than islets from healthy individuals. In conclusion, we have generated the first reference map of the human proinsulin interactome to identify critical factors controlling insulin biosynthesis, ß-cell function, and T2D.


Asunto(s)
Diabetes Mellitus Tipo 2/metabolismo , Insulina/metabolismo , Peroxirredoxinas/metabolismo , Proinsulina/química , Proinsulina/metabolismo , Western Blotting , Diabetes Mellitus Tipo 2/genética , Retículo Endoplásmico/genética , Retículo Endoplásmico/metabolismo , Femenino , Humanos , Inmunoprecipitación , Insulina/química , Masculino , Peroxirredoxinas/genética , Unión Proteica , Pliegue de Proteína , Espectrometría de Masas en Tándem
6.
Nat Commun ; 10(1): 5052, 2019 11 07.
Artículo en Inglés | MEDLINE | ID: mdl-31699992

RESUMEN

To understand the impact of epigenetics on human misfolding disease, we apply Gaussian-process regression (GPR) based machine learning (ML) (GPR-ML) through variation spatial profiling (VSP). VSP generates population-based matrices describing the spatial covariance (SCV) relationships that link genetic diversity to fitness of the individual in response to histone deacetylases inhibitors (HDACi). Niemann-Pick C1 (NPC1) is a Mendelian disorder caused by >300 variants in the NPC1 gene that disrupt cholesterol homeostasis leading to the rapid onset and progression of neurodegenerative disease. We determine the sequence-to-function-to-structure relationships of the NPC1 polypeptide fold required for membrane trafficking and generation of a tunnel that mediates cholesterol flux in late endosomal/lysosomal (LE/Ly) compartments. HDACi treatment reveals unanticipated epigenomic plasticity in SCV relationships that restore NPC1 functionality. GPR-ML based matrices capture the epigenetic processes impacting information flow through central dogma, providing a framework for quantifying the effect of the environment on the healthspan of the individual.


Asunto(s)
Colesterol/metabolismo , Fibroblastos/metabolismo , Metabolismo de los Lípidos/genética , Proteína Niemann-Pick C1/genética , Enfermedad de Niemann-Pick Tipo C/genética , Línea Celular Tumoral , Endosomas/efectos de los fármacos , Endosomas/metabolismo , Epigénesis Genética , Epigenómica , Fibroblastos/efectos de los fármacos , Células HeLa , Inhibidores de Histona Desacetilasas/farmacología , Homeostasis/efectos de los fármacos , Homeostasis/genética , Humanos , Metabolismo de los Lípidos/efectos de los fármacos , Lisosomas/efectos de los fármacos , Lisosomas/metabolismo , Aprendizaje Automático , Proteína Niemann-Pick C1/metabolismo , Enfermedad de Niemann-Pick Tipo C/metabolismo , Distribución Normal , Deficiencias en la Proteostasis/genética , Deficiencias en la Proteostasis/metabolismo , Análisis de Regresión , Relación Estructura-Actividad , Vorinostat/farmacología
7.
Front Immunol ; 9: 2074, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30271408

RESUMEN

To date there has not been a study directly comparing relative Igκ rearrangement frequencies obtained from genomic DNA (gDNA) and cDNA and since each approach has potential biases, this is an important issue to clarify. Here we used deep sequencing to compare the unbiased gDNA and RNA Igκ repertoire from the same pre-B cell pool. We find that ~20% of Vκ genes have rearrangement frequencies ≥2-fold up or down in RNA vs. DNA libraries, including many members of the Vκ3, Vκ4, and Vκ6 families. Regression analysis indicates Ikaros and E2A binding are associated with strong promoters. Within the pre-B cell repertoire, we observed that individual Vκ genes rearranged at very different frequencies, and also displayed very different Jκ usage. Regression analysis revealed that the greatly unequal Vκ gene rearrangement frequencies are best predicted by epigenetic marks of enhancers. In particular, the levels of newly arising H3K4me1 peaks associated with many Vκ genes in pre-B cells are most predictive of rearrangement levels. Since H3K4me1 is associated with long range chromatin interactions which are created during locus contraction, our data provides mechanistic insight into unequal rearrangement levels. Comparison of Igκ rearrangements occurring in pro-B cells and pre-B cells from the same mice reveal a pro-B cell bias toward usage of Jκ-distal Vκ genes, particularly Vκ10-96 and Vκ1-135. Regression analysis indicates that PU.1 binding is the highest predictor of Vκ gene rearrangement frequency in pro-B cells. Lastly, the repertoires of iEκ-/- pre-B cells reveal that iEκ actively influences Vκ gene usage, particularly Vκ3 family genes, overlapping with a zone of iEκ-regulated germline transcription. These represent new roles for iEκ in addition to its critical function in promoting overall Igκ rearrangement. Together, this study provides insight into many aspects of Igκ repertoire formation.


Asunto(s)
Linfocitos B/fisiología , Cadenas Ligeras de Inmunoglobulina/genética , Células Precursoras de Linfocitos B/fisiología , Animales , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/genética , Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , ADN Complementario/genética , Epigénesis Genética , Reordenamiento Génico de Cadena Ligera de Linfocito B , Genoma , Factor de Transcripción Ikaros/genética , Factor de Transcripción Ikaros/metabolismo , Región Variable de Inmunoglobulina/genética , Cadenas kappa de Inmunoglobulina/genética , Ratones , Ratones Endogámicos C57BL , Regiones Promotoras Genéticas/genética , Unión Proteica , Proteínas Proto-Oncogénicas/metabolismo , Transactivadores/metabolismo
8.
J Biol Chem ; 293(35): 13477-13495, 2018 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-30006345

RESUMEN

Inherited and somatic rare diseases result from >200,000 genetic variants leading to loss- or gain-of-toxic function, often caused by protein misfolding. Many of these misfolded variants fail to properly interact with other proteins. Understanding the link between factors mediating the transcription, translation, and protein folding of these disease-associated variants remains a major challenge in cell biology. Herein, we utilized the cystic fibrosis transmembrane conductance regulator (CFTR) protein as a model and performed a proteomics-based high-throughput screen (HTS) to identify pathways and components affecting the folding and function of the most common cystic fibrosis-associated mutation, the F508del variant of CFTR. Using a shortest-path algorithm we developed, we mapped HTS hits to the CFTR interactome to provide functional context to the targets and identified the eukaryotic translation initiation factor 3a (eIF3a) as a central hub for the biogenesis of CFTR. Of note, siRNA-mediated silencing of eIF3a reduced the polysome-to-monosome ratio in F508del-expressing cells, which, in turn, decreased the translation of CFTR variants, leading to increased CFTR stability, trafficking, and function at the cell surface. This finding suggested that eIF3a is involved in mediating the impact of genetic variations in CFTR on the folding of this protein. We posit that the number of ribosomes on a CFTR mRNA transcript is inversely correlated with the stability of the translated polypeptide. Polysome-based translation challenges the capacity of the proteostasis environment to balance message fidelity with protein folding, leading to disease. We suggest that this deficit can be corrected through control of translation initiation.


Asunto(s)
Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Regulador de Conductancia de Transmembrana de Fibrosis Quística/metabolismo , Factor 3 de Iniciación Eucariótica/metabolismo , Iniciación de la Cadena Peptídica Traduccional , Línea Celular , Regulador de Conductancia de Transmembrana de Fibrosis Quística/química , Factor 3 de Iniciación Eucariótica/genética , Humanos , Mutación , Fenilalanina/química , Fenilalanina/genética , Fenilalanina/metabolismo , Pliegue de Proteína , Mapas de Interacción de Proteínas , Transporte de Proteínas , Interferencia de ARN , ARN Interferente Pequeño/genética
9.
J Mol Biol ; 430(18 Pt A): 2951-2973, 2018 09 14.
Artículo en Inglés | MEDLINE | ID: mdl-29924966

RESUMEN

The advent of precision medicine for genetic diseases has been hampered by the large number of variants that cause familial and somatic disease, a complexity that is further confounded by the impact of genetic modifiers. To begin to understand differences in onset, progression and therapeutic response that exist among disease-causing variants, we present the proteomic variant approach (ProVarA), a proteomic method that integrates mass spectrometry with genomic tools to dissect the etiology of disease. To illustrate its value, we examined the impact of variation in cystic fibrosis (CF), where 2025 disease-associated mutations in the CF transmembrane conductance regulator (CFTR) gene have been annotated and where individual genotypes exhibit phenotypic heterogeneity and response to therapeutic intervention. A comparative analysis of variant-specific proteomics allows us to identify a number of protein interactions contributing to the basic defects associated with F508del- and G551D-CFTR, two of the most common disease-associated variants in the patient population. We demonstrate that a number of these causal interactions are significantly altered in response to treatment with Vx809 and Vx770, small-molecule therapeutics that respectively target the F508del and G551D variants. ProVarA represents the first comparative proteomic analysis among multiple disease-causing mutations, thereby providing a methodological approach that provides a significant advancement to existing proteomic efforts in understanding the impact of variation in CF disease. We posit that the implementation of ProVarA for any familial or somatic mutation will provide a substantial increase in the knowledge base needed to implement a precision medicine-based approach for clinical management of disease.


Asunto(s)
Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Medicina de Precisión , Proteómica , Biomarcadores , Fibrosis Quística/diagnóstico , Fibrosis Quística/genética , Fibrosis Quística/metabolismo , Perfilación de la Expresión Génica , Estudios de Asociación Genética/métodos , Variación Genética , Genotipo , Humanos , Mutación , Medicina de Precisión/métodos , Proteoma , Proteómica/métodos
10.
Front Immunol ; 9: 425, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29593713

RESUMEN

CCCTC-binding factor (CTCF) is largely responsible for the 3D architecture of the genome, in concert with the action of cohesin, through the creation of long-range chromatin loops. Cohesin is hypothesized to be the main driver of these long-range chromatin interactions by the process of loop extrusion. Here, we performed ChIP-seq for CTCF and cohesin in two stages each of T and B cell differentiation and examined the binding pattern in all six antigen receptor (AgR) loci in these lymphocyte progenitors and in mature T and B cells, ES cells, and fibroblasts. The four large AgR loci have many bound CTCF sites, most of which are only occupied in lymphocytes, while only the CTCF sites at the end of each locus near the enhancers or J genes tend to be bound in non-lymphoid cells also. However, despite the generalized lymphocyte restriction of CTCF binding in AgR loci, the Igκ locus is the only locus that also shows significant lineage-specificity (T vs. B cells) and developmental stage-specificity (pre-B vs. pro-B) in CTCF binding. We show that cohesin binding shows greater lineage- and stage-specificity than CTCF at most AgR loci, providing more specificity to the loops. We also show that the culture of pro-B cells in IL7, a common practice to expand the number of cells before ChIP-seq, results in a CTCF-binding pattern resembling pre-B cells, as well as other epigenetic and transcriptional characteristics of pre-B cells. Analysis of the orientation of the CTCF sites show that all sites within the large V portions of the Igh and TCRß loci have the same orientation. This suggests either a lack of requirement for convergent CTCF sites creating loops, or indicates an absence of any loops between CTCF sites within the V region portion of those loci but only loops to the convergent sites at the D-J-enhancer end of each locus. The V region portions of the Igκ and TCRα/δ loci, by contrast, have CTCF sites in both orientations, providing many options for creating CTCF-mediated convergent loops throughout the loci. CTCF/cohesin loops, along with transcription factors, drives contraction of AgR loci to facilitate the creation of a diverse repertoire of antibodies and T cell receptors.


Asunto(s)
Linfocitos B/fisiología , Factor de Unión a CCCTC/metabolismo , Proteínas de Ciclo Celular/metabolismo , Proteínas Cromosómicas no Histona/metabolismo , Inmunoglobulinas/genética , Receptores de Antígenos de Linfocitos T/genética , Linfocitos T/fisiología , Animales , Factor de Unión a CCCTC/genética , Diferenciación Celular , Linaje de la Célula , Células Cultivadas , Cromatina/metabolismo , Proteínas de Unión al ADN/genética , Regulación del Desarrollo de la Expresión Génica , Genes RAG-1 , Sitios Genéticos/genética , Ratones , Ratones Endogámicos C57BL , Ratones Noqueados , Especificidad de Órganos , Unión Proteica , Cohesinas
11.
Proc Natl Acad Sci U S A ; 113(27): E3911-20, 2016 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-27335461

RESUMEN

Ying Yang 1 (YY1) is a ubiquitously expressed transcription factor shown to be essential for pro-B-cell development. However, the role of YY1 in other B-cell populations has never been investigated. Recent bioinformatics analysis data have implicated YY1 in the germinal center (GC) B-cell transcriptional program. In accord with this prediction, we demonstrated that deletion of YY1 by Cγ1-Cre completely prevented differentiation of GC B cells and plasma cells. To determine if YY1 was also required for the differentiation of other B-cell populations, we deleted YY1 with CD19-Cre and found that all peripheral B-cell subsets, including B1 B cells, require YY1 for their differentiation. Transitional 1 (T1) B cells were the most dependent upon YY1, being sensitive to even a half-dosage of YY1 and also to short-term YY1 deletion by tamoxifen-induced Cre. We show that YY1 exerts its effects, in part, by promoting B-cell survival and proliferation. ChIP-sequencing shows that YY1 predominantly binds to promoters, and pathway analysis of the genes that bind YY1 show enrichment in ribosomal functions, mitochondrial functions such as bioenergetics, and functions related to transcription such as mRNA splicing. By RNA-sequencing analysis of differentially expressed genes, we demonstrated that YY1 normally activates genes involved in mitochondrial bioenergetics, whereas it normally down-regulates genes involved in transcription, mRNA splicing, NF-κB signaling pathways, the AP-1 transcription factor network, chromatin remodeling, cytokine signaling pathways, cell adhesion, and cell proliferation. Our results show the crucial role that YY1 plays in regulating broad general processes throughout all stages of B-cell differentiation.


Asunto(s)
Linfocitos B/fisiología , Diferenciación Celular , Regulación de la Expresión Génica , Centro Germinal/fisiología , Factor de Transcripción YY1/fisiología , Animales , Linaje de la Célula , ADN Helicasas/metabolismo , Femenino , Centro Germinal/citología , Histona Demetilasas con Dominio de Jumonji/metabolismo , Masculino , Ratones Endogámicos C57BL
12.
Oncotarget ; 6(26): 22060-71, 2015 Sep 08.
Artículo en Inglés | MEDLINE | ID: mdl-26091350

RESUMEN

SRC kinase is activated in castration resistant prostate cancer (CRPC), phosphorylates the androgen receptor (AR), and causes its ligand-independent activation as a transcription factor. However, activating SRC mutations are exceedingly rare in human tumors, and mechanisms of ectopic SRC activation therefore remain largely unknown. Performing a functional genomics screen, we found that downregulation of SRC inhibitory kinase CSK is sufficient to overcome growth arrest induced by depriving human prostate cancer cells of androgen. CSK knockdown led to ectopic SRC activation, increased AR signaling, and resistance to anti-androgens. Consistent with the in vitro observations, stable knockdown of CSK conferred castration resistance in mouse xenograft models, while sensitivity to the tyrosine kinase inhibitor dasatinib was retained. Finally, CSK was found downregulated in a distinct subset of CRPCs marked by AR amplification and ETS2 deletion but lacking PTEN and RB1 mutations. These results identify CSK downregulation as a principal driver of SRC activation and castration resistance and validate SRC as a drug target in a molecularly defined subclass of CRPCs.


Asunto(s)
Neoplasias de la Próstata Resistentes a la Castración/enzimología , Familia-src Quinasas/metabolismo , Animales , Proteína Tirosina Quinasa CSK , Línea Celular Tumoral , Proliferación Celular/fisiología , Regulación hacia Abajo , Células HEK293 , Xenoinjertos , Humanos , Masculino , Ratones , Ratones Endogámicos NOD , Ratones SCID , Neoplasias de la Próstata Resistentes a la Castración/genética , Neoplasias de la Próstata Resistentes a la Castración/patología , Transducción de Señal , Transfección , Ensayos Antitumor por Modelo de Xenoinjerto , Familia-src Quinasas/genética
13.
Bioinformatics ; 31(11): 1724-8, 2015 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-25637560

RESUMEN

MOTIVATION: Omics Pipe (http://sulab.scripps.edu/omicspipe) is a computational framework that automates multi-omics data analysis pipelines on high performance compute clusters and in the cloud. It supports best practice published pipelines for RNA-seq, miRNA-seq, Exome-seq, Whole-Genome sequencing, ChIP-seq analyses and automatic processing of data from The Cancer Genome Atlas (TCGA). Omics Pipe provides researchers with a tool for reproducible, open source and extensible next generation sequencing analysis. The goal of Omics Pipe is to democratize next-generation sequencing analysis by dramatically increasing the accessibility and reproducibility of best practice computational pipelines, which will enable researchers to generate biologically meaningful and interpretable results. RESULTS: Using Omics Pipe, we analyzed 100 TCGA breast invasive carcinoma paired tumor-normal datasets based on the latest UCSC hg19 RefSeq annotation. Omics Pipe automatically downloaded and processed the desired TCGA samples on a high throughput compute cluster to produce a results report for each sample. We aggregated the individual sample results and compared them to the analysis in the original publications. This comparison revealed high overlap between the analyses, as well as novel findings due to the use of updated annotations and methods. AVAILABILITY AND IMPLEMENTATION: Source code for Omics Pipe is freely available on the web (https://bitbucket.org/sulab/omics_pipe). Omics Pipe is distributed as a standalone Python package for installation (https://pypi.python.org/pypi/omics_pipe) and as an Amazon Machine Image in Amazon Web Services Elastic Compute Cloud that contains all necessary third-party software dependencies and databases (https://pythonhosted.org/omics_pipe/AWS_installation.html).


Asunto(s)
Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Neoplasias de la Mama/genética , Análisis por Conglomerados , Bases de Datos Factuales , Exoma , Femenino , Humanos , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN
14.
J Proteome Res ; 14(1): 164-82, 2015 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-25362887

RESUMEN

Benzo[a]pyrene (B[a]P) is an environmental contaminant mainly studied for its toxic/carcinogenic effects. For a comprehensive and pathway orientated mechanistic understanding of the effects directly triggered by a toxic (5 µM) or a subtoxic (50 nM) concentration of B[a]P or indirectly by its metabolites, we conducted time series experiments for up to 24 h to study the effects in murine hepatocytes. These cells rapidly take up and actively metabolize B[a]P, which was followed by quantitative analysis of the concentration of intracellular B[a]P and seven representative degradation products. Exposure with 5 µM B[a]P led to a maximal intracellular concentration of 1604 pmol/5 × 10(4) cells, leveling at 55 pmol/5 × 10(4) cells by the end of the time course. Changes in the global proteome (>1000 protein profiles) and metabolome (163 metabolites) were assessed in combination with B[a]P degradation. Abundance profiles of 236 (both concentrations), 190 (only 5 µM), and 150 (only 50 nM) proteins were found to be regulated in response to B[a]P in a time-dependent manner. At the endogenous metabolite level amino acids, acylcarnitines and glycerophospholipids were particularly affected by B[a]P. The comprehensive chemical, proteome and metabolomic data enabled the identification of effects on the pathway level in a time-resolved manner. So in addition to known alterations, also protein synthesis, lipid metabolism, and membrane dysfunction were identified as B[a]P specific effects.


Asunto(s)
Benzo(a)pireno/toxicidad , Contaminantes Ambientales/toxicidad , Aminoácidos/metabolismo , Animales , Benzo(a)pireno/metabolismo , Metabolismo de los Hidratos de Carbono/efectos de los fármacos , Línea Celular Tumoral , Contaminantes Ambientales/metabolismo , Expresión Génica/efectos de los fármacos , Metabolismo de los Lípidos/efectos de los fármacos , Redes y Vías Metabólicas , Metaboloma , Ratones , Proteoma/genética , Proteoma/metabolismo
15.
JMIR Serious Games ; 2(2): e7, 2014 Jul 29.
Artículo en Inglés | MEDLINE | ID: mdl-25654473

RESUMEN

BACKGROUND: Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility, and biological interpretability. Methods that take advantage of structured prior knowledge (eg, protein interaction networks) show promise in helping to define better signatures, but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes unheard of before. OBJECTIVE: The main objective of this study was to test the hypothesis that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from players of an open, Web-based game. We envisioned capturing knowledge both from the player's prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. METHODS: We developed and evaluated an online game called The Cure that captured information from players regarding genes for use as predictors of breast cancer survival. Information gathered from game play was aggregated using a voting approach, and used to create rankings of genes. The top genes from these rankings were evaluated using annotation enrichment analysis, comparison to prior predictor gene sets, and by using them to train and test machine learning systems for predicting 10 year survival. RESULTS: Between its launch in September 2012 and September 2013, The Cure attracted more than 1000 registered players, who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data showed significant enrichment for genes known to be related to key concepts such as cancer, disease progression, and recurrence. In terms of the predictive accuracy of models trained using this information, these gene sets provided comparable performance to gene sets generated using other methods, including those used in commercial tests. The Cure is available on the Internet. CONCLUSIONS: The principal contribution of this work is to show that crowdsourcing games can be developed as a means to address problems involving domain knowledge. While most prior work on scientific discovery games and crowdsourcing in general takes as a premise that contributors have little or no expertise, here we demonstrated a crowdsourcing system that succeeded in capturing expert knowledge.

16.
Proc Natl Acad Sci U S A ; 110(34): E3206-15, 2013 Aug 20.
Artículo en Inglés | MEDLINE | ID: mdl-23918392

RESUMEN

The primary antigen receptor repertoire is sculpted by the process of V(D)J recombination, which must strike a balance between diversification and favoring gene segments with specialized functions. The precise determinants of how often gene segments are chosen to complete variable region coding exons remain elusive. We quantified Vß use in the preselection Tcrb repertoire and report relative contributions of 13 distinct features that may shape their recombination efficiencies, including transcription, chromatin environment, spatial proximity to their DßJß targets, and predicted quality of recombination signal sequences (RSSs). We show that, in contrast to functional Vß gene segments, all pseudo-Vß segments are sequestered in transcriptionally silent chromatin, which effectively suppresses wasteful recombination. Importantly, computational analyses provide a unifying model, revealing a minimum set of five parameters that are predictive of Vß use, dominated by chromatin modifications associated with transcription, but largely independent of precise spatial proximity to DßJß clusters. This learned model-building strategy may be useful in predicting the relative contributions of epigenetic, spatial, and RSS features in shaping preselection V repertoires at other antigen receptor loci. Ultimately, such models may also predict how designed or naturally occurring alterations of these loci perturb the preselection use of variable gene segments.


Asunto(s)
Regulación de la Expresión Génica/inmunología , Genes Codificadores de la Cadena beta de los Receptores de Linfocito T/genética , Genes Codificadores de la Cadena beta de los Receptores de Linfocito T/inmunología , Región Variable de Inmunoglobulina/genética , Modelos Inmunológicos , Recombinación V(D)J/inmunología , Animales , Cromatina/inmunología , Inmunoprecipitación de Cromatina , Biología Computacional/métodos , Cartilla de ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Luciferasas , Ratones , Ratones Endogámicos C57BL , Análisis de Regresión , Recombinación V(D)J/genética
17.
PLoS One ; 8(8): e71171, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23951102

RESUMEN

Structured gene annotations are a foundation upon which many bioinformatics and statistical analyses are built. However the structured annotations available in public databases are a sparse representation of biological knowledge as a whole. The rate of biomedical data generation is such that centralized biocuration efforts struggle to keep up. New models for gene annotation need to be explored that expand the pace at which we are able to structure biomedical knowledge. Recently, online games have emerged as an effective way to recruit, engage and organize large numbers of volunteers to help address difficult biological challenges. For example, games have been successfully developed for protein folding (Foldit), multiple sequence alignment (Phylo) and RNA structure design (EteRNA). Here we present Dizeez, a simple online game built with the purpose of structuring knowledge of gene-disease associations. Preliminary results from game play online and at scientific conferences suggest that Dizeez is producing valid gene-disease annotations not yet present in any public database. These early results provide a basic proof of principle that online games can be successfully applied to the challenge of gene annotation. Dizeez is available at http://genegames.org.


Asunto(s)
Biología Computacional/métodos , Almacenamiento y Recuperación de la Información/métodos , Anotación de Secuencia Molecular/métodos , Juegos de Video , Enfermedad/genética , Predisposición Genética a la Enfermedad/genética , Humanos , Internet , Interfaz Usuario-Computador
18.
J Immunol ; 191(5): 2393-402, 2013 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-23898036

RESUMEN

A diverse Ab repertoire is formed through the rearrangement of V, D, and J segments at the IgH (Igh) loci. The C57BL/6 murine Igh locus has >100 functional VH gene segments that can recombine to a rearranged DJH. Although the nonrandom usage of VH genes is well documented, it is not clear what elements determine recombination frequency. To answer this question, we conducted deep sequencing of 5'-RACE products of the Igh repertoire in pro-B cells, amplified in an unbiased manner. Chromatin immunoprecipitation-sequencing results for several histone modifications and RNA polymerase II binding, RNA-sequencing for sense and antisense noncoding germline transcripts, and proximity to CCCTC-binding factor (CTCF) and Rad21 sites were compared with the usage of individual V genes. Computational analyses assessed the relative importance of these various accessibility elements. These elements divide the Igh locus into four epigenetically and transcriptionally distinct domains, and our computational analyses reveal different regulatory mechanisms for each region. Proximal V genes are relatively devoid of active histone marks and noncoding RNA in general, but having a CTCF site near their recombination signal sequence is critical, suggesting that being positioned near the base of the chromatin loops is important for rearrangement. In contrast, distal V genes have higher levels of histone marks and noncoding RNA, which may compensate for their poorer recombination signal sequences and for being distant from CTCF sites. Thus, the Igh locus has evolved a complex system for the regulation of V(D)J rearrangement that is different for each of the four domains that comprise this locus.


Asunto(s)
Reordenamiento Génico de Cadena Pesada de Linfocito B/genética , Genes de las Cadenas Pesadas de las Inmunoglobulinas/genética , Región Variable de Inmunoglobulina/genética , Animales , Inmunoprecipitación de Cromatina , Secuenciación de Nucleótidos de Alto Rendimiento , Ratones , Ratones Endogámicos C57BL , Ratones Noqueados , Análisis de Secuencia de ADN
19.
J Biomed Semantics ; 4 Suppl 1: S4, 2013 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-23734599

RESUMEN

BACKGROUND: The Gene Ontology and its associated annotations are critical tools for interpreting lists of genes. Here, we introduce a method for evaluating the Gene Ontology annotations and structure based on the impact they have on gene set enrichment analysis, along with an example implementation. This task-based approach yields quantitative assessments grounded in experimental data and anchored tightly to the primary use of the annotations. RESULTS: Applied to specific areas of biological interest, our framework allowed us to understand the progress of annotation and structural ontology changes from 2004 to 2012. Our framework was also able to determine that the quality of annotations and structure in the area under test have been improving in their ability to recall underlying biological traits. Furthermore, we were able to distinguish between the impact of changes to the annotation sets and ontology structure. CONCLUSION: Our framework and implementation lay the groundwork for a powerful tool in evaluating the usefulness of the Gene Ontology. We demonstrate both the flexibility and the power of this approach in evaluating the current and past state of the Gene Ontology as well as its applicability in developing new methods for creating gene annotations.

20.
Mol Cell Proteomics ; 12(6): 1741-51, 2013 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-23462206

RESUMEN

We report a high quality and system-wide proteome catalogue covering 71% (3,542 proteins) of the predicted genes of fission yeast, Schizosaccharomyces pombe, presenting the largest protein dataset to date for this important model organism. We obtained this high proteome and peptide (11.4 peptides/protein) coverage by a combination of extensive sample fractionation, high resolution Orbitrap mass spectrometry, and combined database searching using the iProphet software as part of the Trans-Proteomics Pipeline. All raw and processed data are made accessible in the S. pombe PeptideAtlas. The identified proteins showed no biases in functional properties and allowed global estimation of protein abundances. The high coverage of the PeptideAtlas allowed correlation with transcriptomic data in a system-wide manner indicating that post-transcriptional processes control the levels of at least half of all identified proteins. Interestingly, the correlation was not equally tight for all functional categories ranging from r(s) >0.80 for proteins involved in translation to r(s) <0.45 for signal transduction proteins. Moreover, many proteins involved in DNA damage repair could not be detected in the PeptideAtlas despite their high mRNA levels, strengthening the translation-on-demand hypothesis for members of this protein class. In summary, the extensive and publicly available S. pombe PeptideAtlas together with the generated proteotypic peptide spectral library will be a useful resource for future targeted, in-depth, and quantitative proteomic studies on this microorganism.


Asunto(s)
Regulación Fúngica de la Expresión Génica , Péptidos/aislamiento & purificación , Procesamiento Proteico-Postraduccional , Proteoma/metabolismo , ARN Mensajero/metabolismo , Proteínas de Schizosaccharomyces pombe/metabolismo , Schizosaccharomyces/metabolismo , Bases de Datos de Proteínas , Espectrometría de Masas , Familia de Multigenes , Mapeo Peptídico , Proteoma/química , Proteoma/genética , ARN Mensajero/genética , Schizosaccharomyces/química , Schizosaccharomyces/genética , Proteínas de Schizosaccharomyces pombe/química , Proteínas de Schizosaccharomyces pombe/genética , Transducción de Señal
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...