Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 138
Filtrar
1.
Int J Obes (Lond) ; 2024 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-38472354

RESUMO

BACKGROUND/OBJECTIVES: The effects of early life exposures on offspring life-course health are well established. This study assessed whether adding early socio-demographic and perinatal variables to a model based on polygenic risk score (PRS) improves prediction of obesity risk. METHODS: We used the Jerusalem Perinatal study (JPS) with data at birth and body mass index (BMI) and waist circumference (WC) measured at age 32. The PRS was constructed using over 2.1M common SNPs identified in genome-wide association study (GWAS) for BMI. Linear and logistic models were applied in a stepwise approach. We first examined the associations between genetic variables and obesity-related phenotypes (e.g., BMI and WC). Secondly, socio-demographic variables were added and finally perinatal exposures, such as maternal pre-pregnancy BMI (mppBMI) and gestational weight gain (GWG) were added to the model. Improvement in prediction of each step was assessed using measures of model discrimination (area under the curve, AUC), net reclassification improvement (NRI) and integrated discrimination improvement (IDI). RESULTS: One standard deviation (SD) change in PRS was associated with a significant increase in BMI (ß = 1.40) and WC (ß = 2.45). These associations were slightly attenuated (13.7-14.2%) with the addition of early life exposures to the model. Also, higher mppBMI was associated with increased offspring BMI (ß = 0.39) and WC (ß = 0.79) (p < 0.001). For obesity (BMI ≥ 30) prediction, the addition of early socio-demographic and perinatal exposures to the PRS model significantly increased AUC from 0.69 to 0.73. At an obesity risk threshold of 15%, the addition of early socio-demographic and perinatal exposures to the PRS model provided a significant improvement in reclassification of obesity (NRI, 0.147; 95% CI 0.068-0.225). CONCLUSIONS: Inclusion of early life exposures, such as mppBMI and maternal smoking, to a model based on PRS improves obesity risk prediction in an Israeli population-sample.

2.
J Biomed Inform ; 154: 104650, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38701887

RESUMO

BACKGROUND: Distinguishing diseases into distinct subtypes is crucial for study and effective treatment strategies. The Open Targets Platform (OT) integrates biomedical, genetic, and biochemical datasets to empower disease ontologies, classifications, and potential gene targets. Nevertheless, many disease annotations are incomplete, requiring laborious expert medical input. This challenge is especially pronounced for rare and orphan diseases, where resources are scarce. METHODS: We present a machine learning approach to identifying diseases with potential subtypes, using the approximately 23,000 diseases documented in OT. We derive novel features for predicting diseases with subtypes using direct evidence. Machine learning models were applied to analyze feature importance and evaluate predictive performance for discovering both known and novel disease subtypes. RESULTS: Our model achieves a high (89.4%) ROC AUC (Area Under the Receiver Operating Characteristic Curve) in identifying known disease subtypes. We integrated pre-trained deep-learning language models and showed their benefits. Moreover, we identify 515 disease candidates predicted to possess previously unannotated subtypes. CONCLUSIONS: Our models can partition diseases into distinct subtypes. This methodology enables a robust, scalable approach for improving knowledge-based annotations and a comprehensive assessment of disease ontology tiers. Our candidates are attractive targets for further study and personalized medicine, potentially aiding in the unveiling of new therapeutic indications for sought-after targets.


Assuntos
Aprendizado de Máquina , Humanos , Doença/classificação , Curva ROC , Biologia Computacional/métodos , Algoritmos , Aprendizado Profundo
3.
Proc Natl Acad Sci U S A ; 118(34)2021 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-34373319

RESUMO

Atomic structures of several proteins from the coronavirus family are still partial or unavailable. A possible reason for this gap is the instability of these proteins outside of the cellular context, thereby prompting the use of in-cell approaches. In situ cross-linking and mass spectrometry (in situ CLMS) can provide information on the structures of such proteins as they occur in the intact cell. Here, we applied targeted in situ CLMS to structurally probe Nsp1, Nsp2, and nucleocapsid (N) proteins from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and obtained cross-link sets with an average density of one cross-link per 20 residues. We then employed integrative modeling that computationally combined the cross-linking data with domain structures to determine full-length atomic models. For the Nsp2, the cross-links report on a complex topology with long-range interactions. Integrative modeling with structural prediction of individual domains by the AlphaFold2 system allowed us to generate a single consistent all-atom model of the full-length Nsp2. The model reveals three putative metal binding sites and suggests a role for Nsp2 in zinc regulation within the replication-transcription complex. For the N protein, we identified multiple intra- and interdomain cross-links. Our integrative model of the N dimer demonstrates that it can accommodate three single RNA strands simultaneously, both stereochemically and electrostatically. For the Nsp1, cross-links with the 40S ribosome were highly consistent with recent cryogenic electron microscopy structures. These results highlight the importance of cellular context for the structural probing of recalcitrant proteins and demonstrate the effectiveness of targeted in situ CLMS and integrative modeling.


Assuntos
Modelos Moleculares , SARS-CoV-2/química , Proteínas Virais/química , Reagentes de Ligações Cruzadas/química , Células HEK293 , Humanos , Espectrometria de Massas , Domínios Proteicos
4.
Hum Genet ; 142(7): 863-878, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37133573

RESUMO

Hypertension is a polygenic disease that affects over 1.2 billion adults aged 30-79 worldwide. It is a major risk factor for renal, cerebrovascular, and cardiovascular diseases. The heritability of hypertension is estimated to be high; nevertheless, our understanding of its underlying mechanisms remains scarce and incomplete. This study covered the entries from European ancestry from the UK-Biobank (UKB), with 74,090 cases diagnosed with essential (primary) hypertension and 200,734 controls. We compared the findings from large-scale genome-wide association studies (GWAS) to the gene-based method of proteome-wide association studies (PWAS). We focused on 70 statistically significant associated genes, most of which failed to reach significance in variant-based GWAS. A total of 30% of the PWAS-associated genes were validated against independent cohorts, including the Finnish Biobank. Furthermore, gene-based analyses that were performed on both sexes revealed sex-dependent genetics with a stronger genetic component associated with females. Analysis of systolic and diastolic blood pressure measurements confirms a strong genetic effect associated with females. We demonstrated that gene-based approaches provide insight into the underlying biology of hypertension. Specifically, the expression profiles of the identified genes exposed the enrichment of endothelial cells from multiple organs. Furthermore, females' top-ranked significant genes are involved in cellular immunity. We conclude that studying hypertension and blood pressure via gene-based association methods improves interpretability and exposes sex-dependent genetic effects, which enhances clinical utility.


Assuntos
Estudo de Associação Genômica Ampla , Hipertensão , Masculino , Adulto , Humanos , Feminino , Predisposição Genética para Doença , Células Endoteliais , Hipertensão/genética , Proteoma/genética , Polimorfismo de Nucleotídeo Único , Hipertensão Essencial
5.
Bioinformatics ; 38(8): 2102-2110, 2022 04 12.
Artigo em Inglês | MEDLINE | ID: mdl-35020807

RESUMO

SUMMARY: Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme combines language modeling with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to long sequences. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains near state-of-the-art performance, and sometimes exceeds it, on multiple benchmarks covering diverse protein properties (including protein structure, post-translational modifications and biophysical attributes), despite using a far smaller and faster model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. AVAILABILITY AND IMPLEMENTATION: Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Aprendizado Profundo , Sequência de Aminoácidos , Proteínas/química , Idioma , Processamento de Linguagem Natural
6.
Int J Mol Sci ; 24(13)2023 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-37446105

RESUMO

The primary role of microglia is to maintain homeostasis by effectively responding to various disturbances. Activation of transcriptional programs determines the microglia's response to external stimuli. In this study, we stimulated murine neonatal microglial cells with benzoyl ATP (bzATP) and lipopolysaccharide (LPS), and monitored their ability to release pro-inflammatory cytokines. When cells are exposed to bzATP, a purinergic receptor agonist, a short-lived wave of transcriptional changes, occurs. However, only combining bzATP and LPS led to a sustainable and robust response. The transcriptional profile is dominated by induced cytokines (e.g., IL-1α and IL-1ß), chemokines, and their membrane receptors. Several abundant long noncoding RNAs (lncRNAs) are induced by bzATP/LPS, including Ptgs2os2, Bc1, and Morrbid, that function in inflammation and cytokine production. Analyzing the observed changes through TNF (Tumor necrosis factor) and NF-κB (nuclear factor kappa light chain enhancer of activated B cells) pathways confirmed that neonatal glial cells exhibit a distinctive expression program in which inflammatory-related genes are upregulated by orders of magnitude. The observed capacity of the microglial culture to activate a robust inflammatory response is useful for studying neurons under stress, brain injury, and aging. We propose the use of a primary neonatal microglia culture as a responsive in vitro model for testing drugs that may interact with inflammatory signaling and the lncRNA regulatory network.


Assuntos
Lipopolissacarídeos , Microglia , Camundongos , Animais , Microglia/metabolismo , Lipopolissacarídeos/farmacologia , Lipopolissacarídeos/metabolismo , NF-kappa B/metabolismo , Citocinas/metabolismo , Neuroglia/metabolismo , Inflamação/metabolismo , Células Cultivadas
7.
Adv Exp Med Biol ; 1385: 133-160, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36352213

RESUMO

MicroRNAs (miRNAs) provide a fundamental layer of regulation in cells. miRNAs act posttranscriptionally through complementary base-pairing with the 3'-UTR of a target mRNA, leading to mRNA degradation and translation arrest. The likelihood of forming a valid miRNA-target duplex within cells was computationally predicted and experimentally monitored. In human cells, the miRNA profiles determine their identity and physiology. Therefore, alterations in the composition of miRNAs signify many cancer types and chronic diseases. In this chapter, we introduce online functional tools and resources to facilitate miRNA research. We start by introducing currently available miRNA catalogs and miRNA-gateway portals for navigating among different miRNA-centric online resources. We then sketch several realistic challenges that may occur while investigating miRNA regulation in living cells. As a showcase, we demonstrate the utility of miRNAs and mRNAs expression databases that cover diverse human cells and tissues, including resources that report on genetic alterations affecting miRNA expression levels and alteration in binding capacity. Introducing tools linking miRNAs with transcription factor (TF) networks reveals miRNA regulation complexity within living cells. Finally, we concentrate on online resources that analyze miRNAs in human diseases and specifically in cancer. Altogether, we introduce contemporary, selected resources and online tools for studying miRNA regulation in cells and tissues and their utility in health and disease.


Assuntos
MicroRNAs , Humanos , Regulação da Expressão Gênica , MicroRNAs/genética , MicroRNAs/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Fatores de Transcrição/metabolismo , Bases de Dados Factuais
8.
Int J Mol Sci ; 23(24)2022 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-36555797

RESUMO

Mature microRNAs (miRNAs) are single-stranded non-coding RNA (ncRNA) molecules that act in post-transcriptional regulation in animals and plants. A mature miRNA is the end product of consecutive, highly regulated processing steps of the primary miRNA transcript. Following base-paring of the mature miRNA with its mRNA target, translation is inhibited, and the targeted mRNA is degraded. There are hundreds of miRNAs in each cell that work together to regulate cellular key processes, including development, differentiation, cell cycle, apoptosis, inflammation, viral infection, and more. In this review, we present an overlooked layer of cellular regulation that addresses cell dynamics affecting miRNA accessibility. We discuss the regulation of miRNA local storage and translocation among cell compartments. The local amounts of the miRNAs and their targets dictate their actual availability, which determines the ability to fine-tune cell responses to abrupt or chronic changes. We emphasize that changes in miRNA storage and compactization occur under induced stress and changing conditions. Furthermore, we demonstrate shared principles on cell physiology, governed by miRNA under oxidative stress, tumorigenesis, viral infection, or synaptic plasticity. The evidence presented in this review article highlights the importance of spatial and temporal miRNA regulation for cell physiology. We argue that limiting the research to mature miRNAs within the cytosol undermines our understanding of the efficacy of miRNAs to regulate cell fate under stress conditions.


Assuntos
MicroRNAs , Animais , MicroRNAs/genética , MicroRNAs/metabolismo , Regulação da Expressão Gênica , RNA Mensageiro/genética , Diferenciação Celular , Homeostase/genética
9.
Bioinformatics ; 36(Suppl_1): i251-i257, 2020 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-32657402

RESUMO

SUMMARY: Current technologies for single-cell transcriptomics allow thousands of cells to be analyzed in a single experiment. The increased scale of these methods raises the risk of cell doublets contamination. Available tools and algorithms for identifying doublets and estimating their occurrence in single-cell experimental data focus on doublets of different species, cell types or individuals. In this study, we analyze transcriptomic data from single cells having an identical genetic background. We claim that the ratio of monoallelic to biallelic expression provides a discriminating power toward doublets' identification. We present a pipeline called BIallelic Ratio for Doublets (BIRD) that relies on heterologous genetic variations, from single-cell RNA sequencing. For each dataset, doublets were artificially created from the actual data and used to train a predictive model. BIRD was applied on Smart-seq data from 163 primary fibroblast single cells. The model achieved 100% accuracy in annotating the randomly simulated doublets. Bonafide doublets were verified based on a biallelic expression signal amongst X-chromosome of female fibroblasts. Data from 10X Genomics microfluidics of human peripheral blood cells achieved in average 83% (±3.7%) accuracy, and an area under the curve of 0.88 (±0.04) for a collection of ∼13 300 single cells. BIRD addresses instances of doublets, which were formed from cell mixtures of identical genetic background and cell identity. Maximal performance is achieved for high-coverage data from Smart-seq. Success in identifying doublets is data specific which varies according to the experimental methodology, genomic diversity between haplotypes, sequence coverage and depth. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Análise de Célula Única , Software , Algoritmos , Biologia Computacional , Humanos , Análise de Sequência de RNA
10.
Nucleic Acids Res ; 47(13): 6642-6655, 2019 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-31334812

RESUMO

Compiling the catalogue of genes actively involved in cancer is an ongoing endeavor, with profound implications to the understanding and treatment of the disease. An abundance of computational methods have been developed to screening the genome for candidate driver genes based on genomic data of somatic mutations in tumors. Existing methods make many implicit and explicit assumptions about the distribution of random mutations. We present FABRIC, a new framework for quantifying the selection of genes in cancer by assessing the effects of de-novo somatic mutations on protein-coding genes. Using a machine-learning model, we quantified the functional effects of ∼3M somatic mutations extracted from over 10 000 human cancerous samples, and compared them against the effects of all possible single-nucleotide mutations in the coding human genome. We detected 593 protein-coding genes showing statistically significant bias towards harmful mutations. These genes, discovered without any prior knowledge, show an overwhelming overlap with known cancer genes, but also include many overlooked genes. FABRIC is designed to avoid false discoveries by comparing each gene to its own background model using rigorous statistics, making minimal assumptions about the distribution of random somatic mutations. The framework is an open-source project with a simple command-line interface.


Assuntos
Biologia Computacional/métodos , Genes Neoplásicos , Mutação , Proteínas de Neoplasias/genética , Neoplasias/genética , Conjuntos de Dados como Assunto , Humanos , Modelos Genéticos , Mutação de Sentido Incorreto , Proteínas de Neoplasias/química , Proteínas de Neoplasias/fisiologia
11.
PLoS Comput Biol ; 15(12): e1007204, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31790387

RESUMO

Mature microRNAs (miRNAs) regulate most human genes through direct base-pairing with mRNAs. We investigate the underlying principles of miRNA regulation in living cells. To this end, we overexpressed miRNAs in different cell types and measured the mRNA decay rate under a paradigm of a transcriptional arrest. Based on an exhaustive matrix of mRNA-miRNA binding probabilities, and parameters extracted from our experiments, we developed a computational framework that captures the cooperative action of miRNAs in living cells. The framework, called COMICS, simulates the stochastic binding events between miRNAs and mRNAs in cells. The input of COMICS is cell-specific profiles of mRNAs and miRNAs, and the outcome is the retention level of each mRNA at the end of 100,000 iterations. The results of COMICS from thousands of miRNA manipulations reveal gene sets that exhibit coordinated behavior with respect to all miRNAs (total of 248 families). We identified a small set of genes that are highly responsive to changes in the expression of almost any of the miRNAs. In contrast, about 20% of the tested genes remain insensitive to a broad range of miRNA manipulations. The set of insensitive genes is strongly enriched with genes that belong to the translation machinery. These trends are shared by different cell types. We conclude that the stochastic nature of miRNAs reveals unexpected robustness of gene expression in living cells. By applying a systematic probabilistic approach some key design principles of cell states are revealed, emphasizing in particular, the immunity of the translational machinery vis-a-vis miRNA manipulations across cell types. We propose COMICS as a valuable platform for assessing the outcome of miRNA regulation of cells in health and disease.


Assuntos
MicroRNAs/genética , MicroRNAs/metabolismo , Modelos Genéticos , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Biologia Computacional , Simulação por Computador , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Células HEK293 , Células HeLa , Humanos , Células MCF-7 , Estabilidade de RNA/genética , Processos Estocásticos
12.
Nucleic Acids Res ; 46(20): 11014-11029, 2018 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-30203035

RESUMO

MicroRNAs (miRNAs) are short non-coding RNAs that negatively regulate the expression and translation of genes in healthy and diseased tissues. Herein, we characterize short RNAs from human HeLa cells found in the supraspliceosome, a nuclear dynamic machine in which pre-mRNA processing occurs. We sequenced small RNAs (<200 nt) extracted from the supraspliceosome, and identified sequences that are derived from 200 miRNAs genes. About three quarters of them are mature miRNAs, whereas the rest account for various defined regions of the pre-miRNA, and its hairpin-loop precursor. Out of these aligned sequences, 53 were undetected in cellular extract, and the abundance of additional 48 strongly differed from that in cellular extract. Notably, we describe seven abundant miRNA-derived sequences that overlap non-coding exons of their host gene. The rich collection of sequences identical to pre-miRNAs at the supraspliceosome suggests overlooked nuclear functions. Specifically, the abundant hsa-mir-99b may affect splicing of LINC01129 primary transcript through base-pairing with its exon-intron junction. Using suppression and overexpression experiments, we show that hsa-mir-7704 negatively regulates the level of the lncRNA HAGLR. We claim that in cases of extended base-pairing complementarity, such supraspliceosomal pre-miRNA sequences might have a role in transcription attenuation, maturation and processing.


Assuntos
MicroRNAs/genética , Precursores de RNA/genética , Spliceossomos/genética , Sequência de Bases , Linhagem Celular , Regulação da Expressão Gênica , Células HeLa , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , MicroRNAs/metabolismo , Processamento Pós-Transcricional do RNA , Splicing de RNA , Spliceossomos/metabolismo
13.
Int J Mol Sci ; 21(21)2020 Oct 30.
Artigo em Inglês | MEDLINE | ID: mdl-33143250

RESUMO

MicroRNAs (miRNAs) act as negative regulators of gene expression in the cytoplasm. Previous studies have identified the presence of miRNAs in the nucleus. Here we study human breast cancer-derived cell-lines (MCF-7 and MDA-MB-231) and a non-tumorigenic cell-line (MCF-10A) and compare their miRNA sequences at the spliceosome fraction (SF). We report that the levels of miRNAs found in the spliceosome, their identity, and pre-miRNA segmental composition are cell-line specific. One such miRNA is miR-7704 whose genomic position overlaps HAGLR, a cancer-related lncRNA. We detected an inverse expression of miR-7704 and HAGLR in the tested cell lines. Specifically, inhibition of miR-7704 caused an increase in HAGLR expression. Furthermore, elevated levels of miR-7704 slightly altered the cell-cycle in MDA-MB-231. Altogether, we show that SF-miR-7704 acts as a tumor-suppressor gene with HAGLR being its nuclear target. The relative levels of miRNAs found in the spliceosome fractions (e.g., miR-100, miR-30a, and let-7 family) in non-tumorigenic relative to cancer-derived cell-lines was monitored. We found that the expression trend of the abundant miRNAs in SF was different from that reported in the literature and from the observation of large cohorts of breast cancer patients, suggesting that many SF-miRNAs act on targets that are different from the cytoplasmic ones. Altogether, we report on the potential of SF-miRNAs as an unexplored route for cancerous cell state.


Assuntos
Biomarcadores Tumorais/genética , Neoplasias da Mama/patologia , Regulação Neoplásica da Expressão Gênica , MicroRNAs/genética , RNA Longo não Codificante/genética , Spliceossomos/genética , Apoptose , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Proliferação de Células , Feminino , Perfilação da Expressão Gênica , Humanos , Células Tumorais Cultivadas
14.
Am J Med Genet B Neuropsychiatr Genet ; 183(7): 412-422, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-32815282

RESUMO

STXBP1, also known as Munc-18, is a master regulator of neurotransmitter release and synaptic function in the human brain through its direct interaction with syntaxin 1A. STXBP1 binds syntaxin 1A is an inactive conformational state. STXBP1 decreases its binding affinity to syntaxin upon phosphorylation, enabling syntaxin 1A to engage in the SNARE complex, leading to neurotransmitter release. STXBP1-related disorders are well characterized by encephalopathy with epilepsy, and a diverse range of neurological and neurodevelopmental conditions. Through exome sequencing of a child with developmental delay, hypotonia, and spasticity, we found a novel de novo insertion mutation of three nucleotides in the STXBP1 coding region, resulting in an additional arginine after position 39 (R39dup). Inconclusive results from state-of-the-art variant prediction tools mandated a structure-based approach using molecular dynamics (MD) simulations of the STXBP1-syntaxin 1A complex. Comparison of the interaction interfaces of the wild-type and the R39dup complexes revealed a reduced interaction surface area in the mutant, leading to destabilization of the protein complex. Moreover, the decrease in affinity toward syntaxin 1A is similar for the phosphorylated STXBP1 and the R39dup. We applied the same MD methodology to seven additional previously reported STXBP1 mutations and reveal that the stability of the STXBP1-syntaxin 1A interface correlates with the reported clinical phenotypes. This study provides a direct link between the outcome of a novel variant in STXBP1 and protein structure and dynamics. The structural change upon mutation drives an alteration in synaptic function.


Assuntos
Deficiências do Desenvolvimento/genética , Proteínas Munc18/genética , Sintaxina 1/metabolismo , Encéfalo/metabolismo , Encefalopatias/genética , Pré-Escolar , Deficiências do Desenvolvimento/fisiopatologia , Eletroencefalografia/métodos , Epilepsia/genética , Feminino , Humanos , Proteínas Munc18/metabolismo , Mutagênese Insercional/genética , Sintaxina 1/genética , Sequenciamento do Exoma/métodos
15.
BMC Genomics ; 20(1): 201, 2019 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-30871455

RESUMO

BACKGROUND: In mammals, sex chromosomes pose an inherent imbalance of gene expression between sexes. In each female somatic cell, random inactivation of one of the X-chromosomes restores this balance. While most genes from the inactivated X-chromosome are silenced, 15-25% are known to escape X-inactivation (termed escapees). The expression levels of these genes are attributed to sex-dependent phenotypic variability. RESULTS: We used single-cell RNA-Seq to detect escapees in somatic cells. As only one X-chromosome is inactivated in each cell, the origin of expression from the active or inactive chromosome can be determined from the variation of sequenced RNAs. We analyzed primary, healthy fibroblasts (n = 104), and clonal lymphoblasts with sequenced parental genomes (n = 25) by measuring the degree of allelic-specific expression (ASE) from heterozygous sites. We identified 24 and 49 candidate escapees, at varying degree of confidence, from the fibroblast and lymphoblast transcriptomes, respectively. We critically test the validity of escapee annotations by comparing our findings with a large collection of independent studies. We find that most genes (66%) from the unified set were previously reported as escapees. Furthermore, out of the overlooked escapees, 11 are long noncoding RNA (lncRNAs). CONCLUSIONS: X-chromosome inactivation and escaping from it are robust, permanent phenomena that are best studies at a single-cell resolution. The cumulative information from individual cells increases the potential of identifying escapees. Moreover, despite the use of a limited number of cells, clonal cells (i.e., same X- chromosomes are coordinately inhibited) with genomic phasing are valuable for detecting escapees at high confidence. Generalizing the method to uncharacterized genomic loci resulted in lncRNAs escapees which account for 20% of the listed candidates. By confirming genes as escapees and propose others as candidates from two different cell types, we contribute to the cumulative knowledge and reliability of human escapees.


Assuntos
Cromossomos Humanos X , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Célula Única/métodos , Transcriptoma , Inativação do Cromossomo X , Alelos , Mapeamento Cromossômico , Feminino , Fibroblastos/citologia , Fibroblastos/metabolismo , Humanos , Recém-Nascido , Linfócitos/citologia , Linfócitos/metabolismo
16.
BMC Cancer ; 19(1): 783, 2019 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-31391007

RESUMO

BACKGROUND: In recent years, research on cancer predisposition germline variants has emerged as a prominent field. The identity of somatic mutations is based on a reliable mapping of the patient germline variants. In addition, the statistics of germline variants frequencies in healthy individuals and cancer patients is the basis for seeking candidates for cancer predisposition genes. The Cancer Genome Atlas (TCGA) is one of the main sources of such data, providing a diverse collection of molecular data including deep sequencing for more than 30 types of cancer from > 10,000 patients. METHODS: Our hypothesis in this study is that whole exome sequences from blood samples of cancer patients are not expected to show systematic differences among cancer types. To test this hypothesis, we analyzed common and rare germline variants across six cancer types, covering 2241 samples from TCGA. In our analysis we accounted for inherent variables in the data including the different variant calling protocols, sequencing platforms, and ethnicity. RESULTS: We report on substantial batch effects in germline variants associated with cancer types. We attribute the effect to the specific sequencing centers that produced the data. Specifically, we measured 30% variability in the number of reported germline variants per sample across sequencing centers. The batch effect is further expressed in nucleotide composition and variant frequencies. Importantly, the batch effect causes substantial differences in germline variant distribution patterns across numerous genes, including prominent cancer predisposition genes such as BRCA1, RET, MAX, and KRAS. For most of known cancer predisposition genes, we found a distinct batch-dependent difference in germline variants. CONCLUSION: TCGA germline data is exposed to strong batch effects with substantial variabilities among TCGA sequencing centers. We claim that those batch effects are consequential for numerous TCGA pan-cancer studies. In particular, these effects may compromise the reliability and the potency to detect new cancer predisposition genes. Furthermore, interpretation of pan-cancer analyses should be revisited in view of the source of the genomic data after accounting for the reported batch effects.


Assuntos
Exoma , Genoma Humano , Genômica , Mutação em Linhagem Germinativa , Neoplasias/genética , Estudos de Associação Genética , Predisposição Genética para Doença , Humanos , Estimativa de Kaplan-Meier , Neoplasias/diagnóstico , Neoplasias/mortalidade , Neoplasias/terapia , Medicina de Precisão/métodos
17.
Am J Hematol ; 94(1): 62-73, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30295334

RESUMO

Myeloproliferative neoplasms (MPNs) driver mutations are usually found in JAK2, MPL, and CALR genes; however, 10%-15% of cases are triple negative (TN). A previous study showed lower rate of JAK2 V617F in primary myelofibrosis patients exposed to low doses of ionizing radiation (IR) from Chernobyl accident. To examine distinct driver mutations, we enrolled 281 Ukrainian IR-exposed and unexposed MPN patients. Genomic DNA was obtained from peripheral blood leukocytes. JAK2 V617F, MPL W515, types 1- and 2-like CALR mutations were identified by Sanger Sequencing and real time polymerase chain reaction. Chromosomal alterations were assessed by oligo-SNP microarray platform. Additional genetic variants were identified by whole exome and targeted sequencing. Statistical significance was evaluated by Fisher's exact test and Wilcoxon's rank sum test (R, version 3.4.2). IR-exposed MPN patients exhibited a different genetic profile vs unexposed: lower rate of JAK2 V617F (58.4% vs 75.4%, P = .0077), higher rate of type 1-like CALR mutation (12.2% vs 3.1%, P = .0056), higher rate of TN cases (27.8% vs 16.2%, P = .0366), higher rate of potentially pathogenic sequence variants (mean numbers: 4.8 vs 3.1, P = .0242). Furthermore, we identified several potential drivers specific to IR-exposed TN MPN patients: ATM p.S1691R with copy-neutral loss of heterozygosity at 11q; EZH2 p.D659G at 7q and SUZ12 p.V71 M at 17q with copy number loss. Thus, IR-exposed MPN patients represent a group with distinct genomic characteristics worthy of further study.


Assuntos
Acidente Nuclear de Chernobyl , Transtornos Mieloproliferativos/etiologia , Neoplasias Induzidas por Radiação/etiologia , Poluentes Radioativos/efeitos adversos , Adulto , Idoso , Calreticulina/genética , Aberrações Cromossômicas , DNA/genética , Feminino , Dosagem de Genes , Humanos , Janus Quinase 2/genética , Perda de Heterozigosidade , Masculino , Pessoa de Meia-Idade , Mutação de Sentido Incorreto , Transtornos Mieloproliferativos/epidemiologia , Transtornos Mieloproliferativos/genética , Neoplasias Induzidas por Radiação/epidemiologia , Neoplasias Induzidas por Radiação/genética , Receptores de Trombopoetina/genética , Ucrânia/epidemiologia , Sequenciamento do Exoma , Adulto Jovem
18.
Nucleic Acids Res ; 45(9): 5048-5060, 2017 May 19.
Artigo em Inglês | MEDLINE | ID: mdl-28379430

RESUMO

The primary function of microRNAs (miRNAs) is to maintain cell homeostasis. In cancerous tissues miRNAs' expression undergo drastic alterations. In this study, we use miRNA expression profiles from The Cancer Genome Atlas of 24 cancer types and 3 healthy tissues, collected from >8500 samples. We seek to classify the cancer's origin and tissue identification using the expression from 1046 reported miRNAs. Despite an apparent uniform appearance of miRNAs among cancerous samples, we recover indispensable information from lowly expressed miRNAs regarding the cancer/tissue types. Multiclass support vector machine classification yields an average recall of 58% in identifying the correct tissue and tumor types. Data discretization had led to substantial improvement, reaching an average recall of 91% (95% median). We propose a straightforward protocol as a crucial step in classifying tumors of unknown primary origin. Our counter-intuitive conclusion is that in almost all cancer types, highly expressing miRNAs mask the significant signal that lower expressed miRNAs provide.


Assuntos
Biomarcadores Tumorais/análise , MicroRNAs/análise , Neoplasias/diagnóstico , Biomarcadores Tumorais/genética , Perfilação da Expressão Gênica , Humanos , MicroRNAs/genética , Neoplasias/classificação , Neoplasias/genética
19.
Bioinformatics ; 31(21): 3429-36, 2015 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-26130574

RESUMO

MOTIVATION: The amount of sequenced genomes and proteins is growing at an unprecedented pace. Unfortunately, manual curation and functional knowledge lag behind. Homologous inference often fails at labeling proteins with diverse functions and broad classes. Thus, identifying high-level protein functionality remains challenging. We hypothesize that a universal feature engineering approach can yield classification of high-level functions and unified properties when combined with machine learning approaches, without requiring external databases or alignment. RESULTS: In this study, we present a novel bioinformatics toolkit called ProFET (Protein Feature Engineering Toolkit). ProFET extracts hundreds of features covering the elementary biophysical and sequence derived attributes. Most features capture statistically informative patterns. In addition, different representations of sequences and the amino acids alphabet provide a compact, compressed set of features. The results from ProFET were incorporated in data analysis pipelines, implemented in python and adapted for multi-genome scale analysis. ProFET was applied on 17 established and novel protein benchmark datasets involving classification for a variety of binary and multi-class tasks. The results show state of the art performance. The extracted features' show excellent biological interpretability. The success of ProFET applies to a wide range of high-level functions such as subcellular localization, structural classes and proteins with unique functional properties (e.g. neuropeptide precursors, thermophilic and nucleic acid binding). ProFET allows easy, universal discovery of new target proteins, as well as understanding the features underlying different high-level protein functions. AVAILABILITY AND IMPLEMENTATION: ProFET source code and the datasets used are freely available at https://github.com/ddofer/ProFET. CONTACT: michall@cc.huji.ac.il SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Aminoácidos/química , Biologia Computacional/métodos , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Bases de Dados de Proteínas , Genoma Humano , Humanos , Aprendizado de Máquina , Proteínas/química
20.
Bioinformatics ; 31(4): 616-7, 2015 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-25644272

RESUMO

UNLABELLED: Speed is of the essence in combating Ebola; thus, computational approaches should form a significant component of Ebola research. As for the development of any modern drug, computational biology is uniquely positioned to contribute through comparative analysis of the genome sequences of Ebola strains and three-dimensional protein modeling. Other computational approaches to Ebola may include large-scale docking studies of Ebola proteins with human proteins and with small-molecule libraries, computational modeling of the spread of the virus, computational mining of the Ebola literature and creation of a curated Ebola database. Taken together, such computational efforts could significantly accelerate traditional scientific approaches. In recognition of the need for important and immediate solutions from the field of computational biology against Ebola, the International Society for Computational Biology (ISCB) announces a prize for an important computational advance in fighting the Ebola virus. ISCB will confer the ISCB Fight against Ebola Award, along with a prize of US$2000, at its July 2016 annual meeting (ISCB Intelligent Systems for Molecular Biology 2016, Orlando, FL). CONTACT: dkovats@iscb.org or rost@in.tum.de.


Assuntos
Distinções e Prêmios , Pesquisa Biomédica , Biologia Computacional , Doença pelo Vírus Ebola/virologia , Sociedades Científicas , Bases de Dados Factuais , Ebolavirus/genética , Ebolavirus/patogenicidade , Humanos , Agências Internacionais
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA