Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 93
Filtrar
1.
Genome Res ; 33(7): 1145-1153, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37414576

RESUMO

Multiple sequence alignment (MSA) is a critical step in the study of protein sequence and function. Typically, MSA algorithms progressively align pairs of sequences and combine these alignments with the aid of a guide tree. These alignment algorithms use scoring systems based on substitution matrices to measure amino acid similarities. Although successful, standard methods struggle on sets of proteins with low sequence identity: the so-called twilight zone of protein alignment. For these difficult cases, another source of information is needed. Protein language models are a powerful new approach that leverages massive sequence data sets to produce high-dimensional contextual embeddings for each amino acid in a sequence. These embeddings have been shown to reflect physicochemical and higher-order structural and functional attributes of amino acids within proteins. Here, we present a novel approach to MSA, based on clustering and ordering amino acid contextual embeddings. Our method for aligning semantically consistent groups of proteins circumvents the need for many standard components of MSA algorithms, avoiding initial guide tree construction, intermediate pairwise alignments, gap penalties, and substitution matrices. The added information from contextual embeddings leads to higher accuracy alignments for structurally similar proteins with low amino-acid similarity. We anticipate that protein language models will become a fundamental component of the next generation of algorithms for generating MSAs.


Assuntos
Algoritmos , Proteínas , Alinhamento de Sequência , Proteínas/genética , Proteínas/química , Sequência de Aminoácidos , Aminoácidos , Idioma
2.
Genome Res ; 2022 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-36123148

RESUMO

Knowledge of how proteins interact with DNA is essential for understanding gene regulation. Although DNA-binding specificities for thousands of transcription factors (TFs) have been determined, the specific amino acid-base interactions comprising their structural interfaces are largely unknown. This lack of resolution hampers attempts to leverage these data in order to predict specificities for uncharacterized TFs or TFs mutated in disease. Here we introduce recognition code learning via automated mapping of protein-DNA structural interfaces (rCLAMPS), a probabilistic approach that uses DNA-binding specificities for TFs from the same structural family to simultaneously infer both which nucleotide positions are contacted by particular amino acids within the TF as well as a recognition code that relates each base-contacting amino acid to nucleotide preferences at the DNA positions it contacts. We apply rCLAMPS to homeodomains, the second largest family of TFs in metazoans and show that it learns a highly effective recognition code that can predict de novo DNA-binding specificities for TFs. Furthermore, we show that the inferred amino acid-nucleotide contacts reveal whether and how nucleotide preferences at individual binding site positions are altered by mutations within TFs. Our approach is an important step toward automatically uncovering the determinants of protein-DNA specificity from large compendia of DNA-binding specificities and inferring the altered functionalities of TFs mutated in disease.

3.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-35042818

RESUMO

The protovertebrate Ciona intestinalis type A (sometimes called Ciona robusta) contains a series of sensory cell types distributed across the head-tail axis of swimming tadpoles. They arise from lateral regions of the neural plate that exhibit properties of vertebrate placodes and neural crest. The sensory determinant POU IV/Brn3 is known to work in concert with regional determinants, such as Foxg and Neurogenin, to produce palp sensory cells (PSCs) and bipolar tail neurons (BTNs), in head and tail regions, respectively. A combination of single-cell RNA-sequencing (scRNA-seq) assays, computational analysis, and experimental manipulations suggests that misexpression of POU IV results in variable transformations of epidermal cells into hybrid sensory cell types, including those exhibiting properties of both PSCs and BTNs. Hybrid properties are due to coexpression of Foxg and Neurogenin that is triggered by an unexpected POU IV feedback loop. Hybrid cells were also found to express a synthetic gene battery that is not coexpressed in any known cell type. We discuss these results with respect to the opportunities and challenges of reprogramming cell types through the targeted misexpression of cellular determinants.


Assuntos
Ciona intestinalis/genética , Neurônios/metabolismo , Fatores do Domínio POU/metabolismo , Animais , Evolução Biológica , Reprogramação Celular/genética , Reprogramação Celular/fisiologia , Ciona intestinalis/metabolismo , Epiderme/inervação , Epiderme/metabolismo , Expressão Gênica/genética , Regulação da Expressão Gênica no Desenvolvimento/genética , Redes Reguladoras de Genes/genética , Crista Neural/metabolismo , Placa Neural/metabolismo , Fatores do Domínio POU/genética , Análise de Célula Única , Fatores de Transcrição/metabolismo , Vertebrados/genética
4.
Nat Methods ; 18(11): 1377-1385, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34711973

RESUMO

Liquid chromatography-high-resolution mass spectrometry (LC-MS)-based metabolomics aims to identify and quantify all metabolites, but most LC-MS peaks remain unidentified. Here we present a global network optimization approach, NetID, to annotate untargeted LC-MS metabolomics data. The approach aims to generate, for all experimentally observed ion peaks, annotations that match the measured masses, retention times and (when available) tandem mass spectrometry fragmentation patterns. Peaks are connected based on mass differences reflecting adduction, fragmentation, isotopes, or feasible biochemical transformations. Global optimization generates a single network linking most observed ion peaks, enhances peak assignment accuracy, and produces chemically informative peak-peak relationships, including for peaks lacking tandem mass spectrometry spectra. Applying this approach to yeast and mouse data, we identified five previously unrecognized metabolites (thiamine derivatives and N-glucosyl-taurine). Isotope tracer studies indicate active flux through these metabolites. Thus, NetID applies existing metabolomic knowledge and global optimization to substantially improve annotation coverage and accuracy in untargeted metabolomics datasets, facilitating metabolite discovery.


Assuntos
Algoritmos , Curadoria de Dados/normas , Fígado/metabolismo , Metaboloma , Metabolômica/normas , Saccharomyces cerevisiae/metabolismo , Animais , Cromatografia Líquida/métodos , Curadoria de Dados/métodos , Metabolômica/métodos , Camundongos , Espectrometria de Massas em Tandem/métodos
5.
PLoS Comput Biol ; 19(3): e1010966, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36952575

RESUMO

Despite the vast phenotypic differences observed across primates, their protein products are largely similar to each other at the sequence level. We hypothesized that, since proteins accomplish all their functions via interactions with other molecules, alterations in the sites that participate in these interactions may be of critical importance. To uncover the extent to which these sites evolve across primates, we built a structurally-derived dataset of ~4,200 one-to-one orthologous sequence groups across 18 primate species, consisting of ~68,000 ligand-binding sites that interact with DNA, RNA, small molecules, ions, or peptides. Using this dataset, we identify functionally important patterns of conservation and variation within the amino acid residues that facilitate protein-ligand interactions across the primate phylogeny. We uncover that interaction sites are significantly more conserved than other sites, and that sites binding DNA and RNA further exhibit the lowest levels of variation. We also show that the subset of ligand-binding sites that do vary are enriched in components of gene regulatory pathways and uncover several instances of human-specific ligand-binding site changes within transcription factors. Altogether, our results suggest that ligand-binding sites have experienced selective pressure in primates and propose that variation in these sites may have an outsized effect on phenotypic variation in primates through pleiotropic effects on gene regulation.


Assuntos
Evolução Molecular , Primatas , Animais , Humanos , Ligantes , Primatas/genética , Filogenia , DNA/genética , Sítios de Ligação/genética , RNA
6.
Nucleic Acids Res ; 49(13): e78, 2021 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-33999210

RESUMO

Domains are instrumental in facilitating protein interactions with DNA, RNA, small molecules, ions and peptides. Identifying ligand-binding domains within sequences is a critical step in protein function annotation, and the ligand-binding properties of proteins are frequently analyzed based upon whether they contain one of these domains. To date, however, knowledge of whether and how protein domains interact with ligands has been limited to domains that have been observed in co-crystal structures; this leaves approximately two-thirds of human protein domain families uncharacterized with respect to whether and how they bind DNA, RNA, small molecules, ions and peptides. To fill this gap, we introduce dSPRINT, a novel ensemble machine learning method for predicting whether a domain binds DNA, RNA, small molecules, ions or peptides, along with the positions within it that participate in these types of interactions. In stringent cross-validation testing, we demonstrate that dSPRINT has an excellent performance in uncovering ligand-binding positions and domains. We also apply dSPRINT to newly characterize the molecular functions of domains of unknown function. dSPRINT's predictions can be transferred from domains to sequences, enabling predictions about the ligand-binding properties of 95% of human genes. The dSPRINT framework and its predictions for 6503 human protein domains are freely available at http://protdomain.princeton.edu/dsprint.


Assuntos
Aprendizado de Máquina , Domínios Proteicos , Sítios de Ligação , DNA/metabolismo , Humanos , Íons/metabolismo , Ligantes , Peptídeos/metabolismo , RNA/metabolismo
7.
Bioinformatics ; 36(22-23): 5322-5329, 2021 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-33325500

RESUMO

MOTIVATION: Accurately predicting the quantitative impact of a substitution on a protein's molecular function would be a great aid in understanding the effects of observed genetic variants across populations. While this remains a challenging task, new approaches can leverage data from the increasing numbers of comprehensive deep mutational scanning (DMS) studies that systematically mutate proteins and measure fitness. RESULTS: We introduce DeMaSk, an intuitive and interpretable method based only upon DMS datasets and sequence homologs that predicts the impact of missense mutations within any protein. DeMaSk first infers a directional amino acid substitution matrix from DMS datasets and then fits a linear model that combines these substitution scores with measures of per-position evolutionary conservation and variant frequency across homologs. Despite its simplicity, DeMaSk has state-of-the-art performance in predicting the impact of amino acid substitutions, and can easily and rapidly be applied to any protein sequence. AVAILABILITY AND IMPLEMENTATION: https://demask.princeton.edu generates fitness impact predictions and visualizations for any user-submitted protein sequence. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

8.
Bioinformatics ; 37(Suppl_1): i133-i141, 2021 07 12.
Artigo em Inglês | MEDLINE | ID: mdl-34252920

RESUMO

MOTIVATION: Protein domain duplications are a major contributor to the functional diversification of protein families. These duplications can occur one at a time through single domain duplications, or as tandem duplications where several consecutive domains are duplicated together as part of a single evolutionary event. Existing methods for inferring domain-level evolutionary events are based on reconciling domain trees with gene trees. While some formulations consider multiple domain duplications, they do not explicitly model tandem duplications; this leads to inaccurate inference of which domains duplicated together over the course of evolution. RESULTS: Here, we introduce a reconciliation-based framework that considers the relative positions of domains within extant sequences. We use this information to uncover tandem domain duplications within the evolutionary history of these genes. We devise an integer linear programming approach that solves our problem exactly, and a heuristic approach that works well in practice. We perform extensive simulation studies to demonstrate that our approaches can accurately uncover single and tandem domain duplications, and additionally test our approach on a well-studied orthogroup where lineage-specific domain expansions exhibit varying and complex domain duplication patterns. AVAILABILITY AND IMPLEMENTATION: Code is available on github at https://github.com/Singh-Lab/TandemDuplications. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Programação Linear , Evolução Molecular , Duplicação Gênica , Humanos , Filogenia , Domínios Proteicos
9.
PLoS Comput Biol ; 17(11): e1009560, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34793437

RESUMO

Severe acute respiratory coronavirus 2 (SARS-CoV-2), the causative agent of COVID-19, is of zoonotic origin. Evolutionary analyses assessing whether coronaviruses similar to SARS-CoV-2 infected ancestral species of modern-day animal hosts could be useful in identifying additional reservoirs of potentially dangerous coronaviruses. We reasoned that if a clade of species has been repeatedly exposed to a virus, then their proteins relevant for viral entry may exhibit adaptations that affect host susceptibility or response. We perform comparative analyses across the mammalian phylogeny of angiotensin-converting enzyme 2 (ACE2), the cellular receptor for SARS-CoV-2, in order to uncover evidence for selection acting at its binding interface with the SARS-CoV-2 spike protein. We uncover that in rodents there is evidence for adaptive amino acid substitutions at positions comprising the ACE2-spike interaction interface, whereas the variation within ACE2 proteins in primates and some other mammalian clades is not consistent with evolutionary adaptations. We also analyze aminopeptidase N (APN), the receptor for the human coronavirus 229E, a virus that causes the common cold, and find evidence for adaptation in primates. Altogether, our results suggest that the rodent and primate lineages may have had ancient exposures to viruses similar to SARS-CoV-2 and HCoV-229E, respectively.


Assuntos
COVID-19/genética , COVID-19/virologia , Infecções por Coronavirus/genética , Infecções por Coronavirus/virologia , SARS-CoV-2/genética , Adaptação Fisiológica/genética , Substituição de Aminoácidos , Enzima de Conversão de Angiotensina 2/genética , Enzima de Conversão de Angiotensina 2/fisiologia , Animais , Antígenos CD13/genética , Antígenos CD13/fisiologia , Resfriado Comum/genética , Resfriado Comum/virologia , Biologia Computacional , Coronavirus Humano 229E/genética , Coronavirus Humano 229E/fisiologia , Evolução Molecular , Genômica , Interações entre Hospedeiro e Microrganismos/genética , Interações entre Hospedeiro e Microrganismos/fisiologia , Especificidade de Hospedeiro/genética , Especificidade de Hospedeiro/fisiologia , Humanos , Mamíferos/genética , Mamíferos/virologia , Filogenia , Domínios e Motivos de Interação entre Proteínas/genética , Receptores Virais/genética , Receptores Virais/fisiologia , SARS-CoV-2/fisiologia , Seleção Genética , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/fisiologia , Internalização do Vírus
10.
Nucleic Acids Res ; 48(2): e9, 2020 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-31777934

RESUMO

We are now in an era where protein-DNA interactions have been experimentally assayed for thousands of DNA-binding proteins. In order to infer DNA-binding specificities from these data, numerous sophisticated computational methods have been developed. These approaches typically infer DNA-binding specificities by considering interactions for each protein independently, ignoring related and potentially valuable interaction information across other proteins that bind DNA via the same structural domain. Here we introduce a framework for inferring DNA-binding specificities by considering protein-DNA interactions for entire groups of structurally similar proteins simultaneously. We devise both constrained optimization and label propagation algorithms for this task, each balancing observations at the individual protein level against dataset-wide consistency of interaction preferences. We test our approaches on two large, independent Cys2His2 zinc finger protein-DNA interaction datasets. We demonstrate that jointly inferring specificities within each dataset individually dramatically improves accuracy, leading to increased agreement both between these two datasets and with a fixed external standard. Overall, our results suggest that sharing protein-DNA interaction information across structurally similar proteins is a powerful means to enable accurate inference of DNA-binding specificities.


Assuntos
Dedos de Zinco CYS2-HIS2/genética , Proteínas de Ligação a DNA/genética , Homologia Estrutural de Proteína , Sítios de Ligação , Fenômenos Bioquímicos , Fenômenos Biofísicos , Proteínas de Ligação a DNA/química , Ligação Proteica/genética
11.
Nucleic Acids Res ; 47(2): 582-593, 2019 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-30535108

RESUMO

Domains are fundamental subunits of proteins, and while they play major roles in facilitating protein-DNA, protein-RNA and other protein-ligand interactions, a systematic assessment of their various interaction modes is still lacking. A comprehensive resource identifying positions within domains that tend to interact with nucleic acids, small molecules and other ligands would expand our knowledge of domain functionality as well as aid in detecting ligand-binding sites within structurally uncharacterized proteins. Here, we introduce an approach to identify per-domain-position interaction 'frequencies' by aggregating protein co-complex structures by domain and ascertaining how often residues mapping to each domain position interact with ligands. We perform this domain-based analysis on ∼91000 co-complex structures, and infer positions involved in binding DNA, RNA, peptides, ions or small molecules across 4128 domains, which we refer to collectively as the InteracDome. Cross-validation testing reveals that ligand-binding positions for 2152 domains are highly consistent and can be used to identify residues facilitating interactions in ∼63-69% of human genes. Our resource of domain-inferred ligand-binding sites should be a great aid in understanding disease etiology: whereas these sites are enriched in Mendelian-associated and cancer somatic mutations, they are depleted in polymorphisms observed across healthy populations. The InteracDome is available at http://interacdome.princeton.edu.


Assuntos
Proteínas de Ligação a DNA/química , DNA/metabolismo , Domínios Proteicos , Proteínas de Ligação a RNA/química , RNA/metabolismo , Sítios de Ligação , DNA/química , Proteínas de Ligação a DNA/metabolismo , Doença/genética , Genes , Humanos , Ligantes , Modelos Moleculares , Mutação , Ligação Proteica , RNA/química , Proteínas de Ligação a RNA/metabolismo
12.
Indian J Microbiol ; 60(1): 12-25, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32089570

RESUMO

A healthy gut is predominantly occupied by bacteria which play a vital role in nutrition and health. Any change in normal gut homeostasis imposes gut dysbiosis. So far, efforts have been made to mitigate the gastrointestinal symptoms using modern day probiotics. The majority of the probiotics strains used currently belong to the genera Lactobacillus, Clostridium, Bifidobacterium and Streptococcus. Recent advancements in culturomics by implementing newer techniques coupled with the use of gnotobiotic animal models provide a subtle ground to develop novel host specific probiotics therapies. In this review article, the recent advances in the development of microbe-based therapies which can now be implemented to treat a wide spectrum of diseases have been discussed. However, these probiotics are not classified as drugs and there is a lack of stringent law enforcement to protect the end users against the pseudo-probiotic products. While modern probiotics hold strong promise for the future, more rigorous regulations are needed to develop genuine probiotic products and characterize novel probiotics using the latest research and technology. This article also highlights the possibility of reducing antibiotic usage by utilizing probiotics developed using the latest concepts of syn and ecobiotics.

13.
RNA ; 23(7): 1097-1109, 2017 07.
Artigo em Inglês | MEDLINE | ID: mdl-28420675

RESUMO

Piwi-interacting RNAs (piRNAs) are central components of the piRNA pathway, which directs transposon silencing and guarantees genome integrity in the germ cells of several metazoans. In Drosophila, piRNAs are produced from discrete regions of the genome termed piRNA clusters, whose expression relies on the RDC complex comprised of the core proteins Rhino, Deadlock, and Cutoff. To date, the RDC complex has been exclusively implicated in the regulation of the piRNA loci. Here we further elucidate the function of Cutoff and the RDC complex by performing genome-wide ChIP-seq and RNA-seq assays in the Drosophila ovaries and analyzing these data together with other publicly available data sets. In agreement with previous studies, we confirm that Cutoff is involved in the transcriptional regulation of piRNA clusters and in the repression of transposable elements in germ cells. Surprisingly, however, we find that Cutoff is enriched at and affects the expression of other noncoding RNAs, including spliceosomal RNAs (snRNAs) and small nucleolar RNAs (snoRNAs). At least in some instances, Cutoff appears to act at a transcriptional level in concert with Rhino and perhaps Deadlock. Finally, we show that mutations in Cutoff result in the deregulation of hundreds of protein-coding genes in germ cells. Our study uncovers a broader function for the RDC complex in the Drosophila germline development.


Assuntos
Proteínas de Drosophila/metabolismo , Drosophila melanogaster/genética , Ovário/crescimento & desenvolvimento , RNA Interferente Pequeno/metabolismo , RNA não Traduzido/metabolismo , Proteínas de Ligação a RNA/metabolismo , Animais , Imunoprecipitação da Cromatina , Elementos de DNA Transponíveis , Proteínas de Drosophila/genética , Drosophila melanogaster/metabolismo , Feminino , Regulação da Expressão Gênica , Regulação da Expressão Gênica no Desenvolvimento , Mutação , Ovário/química , Proteínas de Ligação a RNA/genética , Análise de Sequência de RNA/métodos
14.
PLoS Comput Biol ; 14(6): e1006290, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29953437

RESUMO

A major goal of cancer genomics is to identify somatic mutations that play a role in tumor initiation or progression. Somatic mutations within transcription factors are of particular interest, as gene expression dysregulation is widespread in cancers. The substantial gene expression variation evident across tumors suggests that numerous regulatory factors are likely to be involved and that somatic mutations within them may not occur at high frequencies across patient cohorts, thereby complicating efforts to uncover which ones are cancer-relevant. Here we analyze somatic mutations within the largest family of human transcription factors, namely those that bind DNA via Cys2His2 zinc finger domains. Specifically, to hone in on important mutations within these genes, we aggregated somatic mutations across all of them by their positions within Cys2His2 zinc finger domains. Remarkably, we found that for three classes of cancers profiled by The Cancer Genome Atlas (TCGA)-Uterine Corpus Endometrial Carcinoma, Colon and Rectal Adenocarcinomas, and Skin Cutaneous Melanoma-two specific, functionally important positions within zinc finger domains are mutated significantly more often than expected by chance, with alterations in 18%, 10% and 43% of tumors, respectively. Numerous zinc finger genes are affected, with those containing Krüppel-associated box (KRAB) repressor domains preferentially targeted by these mutations. Further, the genes with these mutations also have high overall missense mutation rates, are expressed at levels comparable to those of known cancer genes, and together have biological process annotations that are consistent with roles in cancers. Altogether, we introduce evidence broadly implicating mutations within a diverse set of zinc finger proteins as relevant for cancer, and propose that they contribute to the widespread transcriptional dysregulation observed in cancer cells.


Assuntos
Dedos de Zinco CYS2-HIS2/genética , Dedos de Zinco/genética , Polipose Adenomatosa do Colo/genética , Sequência de Aminoácidos/genética , Sítios de Ligação/genética , Dedos de Zinco CYS2-HIS2/fisiologia , DNA/metabolismo , Proteínas de Ligação a DNA/metabolismo , Feminino , Humanos , Masculino , Neoplasias/genética , Proteínas Repressoras/genética , Homologia de Sequência de Aminoácidos , Neoplasias Cutâneas/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Neoplasias Uterinas/genética , Dedos de Zinco/fisiologia
15.
Alcohol Clin Exp Res ; 43(12): 2547-2558, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31589333

RESUMO

BACKGROUND: Adolescence is a critical period for neural development, and alcohol exposure during adolescence can lead to an elevated risk for health consequences as well as alcohol use disorders. Clinical and experimental data suggest that chronic alcohol exposure may produce immunomodulatory effects that can lead to the activation of pro-inflammatory cytokine pathways as well as microglial markers. The present study evaluated, in brain and blood, the effects of adolescent alcohol exposure and withdrawal on microglia and on the most representative pro- and anti-inflammatory cytokines and major chemokines that can contribute to the establishing of a neuroinflammatory environment. METHODS: Wistar rats (males, n = 96) were exposed to ethanol (EtOH) vapors, or air control, for 5 weeks over adolescence (PD22-PD58). Brains and blood samples were collected at 3 time points: (i) after 35 days of vapor/air exposure (PD58); (ii) after 1 day of withdrawal (PD59), and (iii) 28 days after withdrawal (PD86). The ionized calcium-binding adapter molecule 1 (Iba-1) was used to index microglial activation, and cytokine/chemokine responses were analyzed using magnetic bead panels. RESULTS: After 35 days of adolescent vapor exposure, a significant increase in Iba-1 immunoreactivity was seen in amygdala, frontal cortex, hippocampus, and substantia nigra. However, Iba-1 density returned to control levels at both 1 day and 28 days of withdrawal except in the hippocampus where Iba-1 density was significantly lower than controls. In serum, adolescent EtOH exposure induced a reduction in IL-13 and an increase in fractalkine at day 35. After 1 day of withdrawal, IL-18 was reduced, and IP-10 was elevated, whereas both IP-10 and IL-10 were elevated at 28 days following withdrawal. In the frontal cortex, adolescent EtOH exposure induced an increase in IL-1ß at day 35, and 28 days of withdrawal, and IL-10 was increased after 28 days of withdrawal. CONCLUSION: These data demonstrate that EtOH exposure during adolescence produces significant microglial activation; however, inflammatory markers seen in the blood appear to differ from those observed in the brain.


Assuntos
Encéfalo/metabolismo , Citocinas/metabolismo , Etanol/efeitos adversos , Síndrome de Abstinência a Substâncias/metabolismo , Fatores Etários , Animais , Proteínas de Ligação ao Cálcio/metabolismo , Citocinas/sangue , Masculino , Proteínas dos Microfilamentos/metabolismo , Microglia/metabolismo , Ratos , Síndrome de Abstinência a Substâncias/sangue , Fatores de Tempo
16.
Nat Rev Genet ; 14(5): 333-46, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-23594911

RESUMO

High-throughput experimental technologies are generating increasingly massive and complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to make the arising problems computationally infeasible. Fortunately, powerful algorithmic techniques lead to software that can answer important biomedical questions in practice. In this Review, we sample the algorithmic landscape, focusing on state-of-the-art techniques, the understanding of which will aid the bench biologist in analysing omics data. We spotlight specific examples that have facilitated and enriched analyses of sequence, transcriptomic and network data sets.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Algoritmos , Mineração de Dados , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos , Software
17.
Bioinformatics ; 33(16): 2471-2478, 2017 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-28407137

RESUMO

MOTIVATION: Protein domain prediction is one of the most powerful approaches for sequence-based function prediction. Although domain instances are typically predicted independently of each other, newer approaches have demonstrated improved performance by rewarding domain pairs that frequently co-occur within sequences. However, most of these approaches have ignored the order in which domains preferentially co-occur and have also not modeled domain co-occurrence probabilistically. RESULTS: We introduce a probabilistic approach for domain prediction that models 'directional' domain context. Our method is the first to score all domain pairs within a sequence while taking their order into account, even for non-sequential domains. We show that our approach extends a previous Markov model-based approach to additionally score all pairwise terms, and that it can be interpreted within the context of Markov random fields. We formulate our underlying combinatorial optimization problem as an integer linear program, and demonstrate that it can be solved quickly in practice. Finally, we perform extensive evaluation of domain context methods and demonstrate that incorporating context increases the number of domain predictions by ∼15%, with our approach dPUC2 (Domain Prediction Using Context) outperforming all competing approaches. AVAILABILITY AND IMPLEMENTATION: dPUC2 is available at http://github.com/alexviiia/dpuc2. CONTACT: mona@cs.princeton.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Domínios Proteicos , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Humanos , Modelos Estatísticos
18.
PLoS Genet ; 11(3): e1005011, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25748510

RESUMO

Differences in transcriptional regulatory networks underlie much of the phenotypic variation observed across organisms. Changes to cis-regulatory elements are widely believed to be the predominant means by which regulatory networks evolve, yet examples of regulatory network divergence due to transcription factor (TF) variation have also been observed. To systematically ascertain the extent to which TFs contribute to regulatory divergence, we analyzed the evolution of the largest class of metazoan TFs, Cys2-His2 zinc finger (C2H2-ZF) TFs, across 12 Drosophila species spanning ~45 million years of evolution. Remarkably, we uncovered that a significant fraction of all C2H2-ZF 1-to-1 orthologs in flies exhibit variations that can affect their DNA-binding specificities. In addition to loss and recruitment of C2H2-ZF domains, we found diverging DNA-contacting residues in ~44% of domains shared between D. melanogaster and the other fly species. These diverging DNA-contacting residues, found in ~70% of the D. melanogaster C2H2-ZF genes in our analysis and corresponding to ~26% of all annotated D. melanogaster TFs, show evidence of functional constraint: they tend to be conserved across phylogenetic clades and evolve slower than other diverging residues. These same variations were rarely found as polymorphisms within a population of D. melanogaster flies, indicating their rapid fixation. The predicted specificities of these dynamic domains gradually change across phylogenetic distances, suggesting stepwise evolutionary trajectories for TF divergence. Further, whereas proteins with conserved C2H2-ZF domains are enriched in developmental functions, those with varying domains exhibit no functional enrichments. Our work suggests that a subset of highly dynamic and largely unstudied TFs are a likely source of regulatory variation in Drosophila and other metazoans.


Assuntos
Evolução Molecular , Redes Reguladoras de Genes/genética , Fatores de Transcrição/genética , Dedos de Zinco/genética , Animais , Proteínas de Ligação a DNA/genética , Drosophila/genética , Filogenia , Sequências Reguladoras de Ácido Nucleico/genética , Especificidade da Espécie
19.
Nucleic Acids Res ; 43(3): 1965-84, 2015 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-25593323

RESUMO

Cys2His2 zinc fingers (C2H2-ZFs) comprise the largest class of metazoan DNA-binding domains. Despite this domain's well-defined DNA-recognition interface, and its successful use in the design of chimeric proteins capable of targeting genomic regions of interest, much remains unknown about its DNA-binding landscape. To help bridge this gap in fundamental knowledge and to provide a resource for design-oriented applications, we screened large synthetic protein libraries to select binding C2H2-ZF domains for each possible three base pair target. The resulting data consist of >160 000 unique domain-DNA interactions and comprise the most comprehensive investigation of C2H2-ZF DNA-binding interactions to date. An integrated analysis of these independent screens yielded DNA-binding profiles for tens of thousands of domains and led to the successful design and prediction of C2H2-ZF DNA-binding specificities. Computational analyses uncovered important aspects of C2H2-ZF domain-DNA interactions, including the roles of within-finger context and domain position on base recognition. We observed the existence of numerous distinct binding strategies for each possible three base pair target and an apparent balance between affinity and specificity of binding. In sum, our comprehensive data help elucidate the complex binding landscape of C2H2-ZF domains and provide a foundation for efforts to determine, predict and engineer their DNA-binding specificities.


Assuntos
Cisteína/química , DNA/metabolismo , Histidina/química , Dedos de Zinco , Sítios de Ligação , DNA/química , Coleta de Dados
20.
PLoS Comput Biol ; 11(10): e1004467, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26436655

RESUMO

Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms--H. sapiens, D. melanogaster, and S. cerevisiae--and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality.


Assuntos
Mapeamento Cromossômico/métodos , Mineração de Dados/métodos , Genoma/genética , Família Multigênica/genética , Processamento de Linguagem Natural , Proteoma/genética , Animais , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Estudo de Associação Genômica Ampla , Humanos , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética , Especificidade da Espécie
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa