RESUMO
A wealth of high-throughput biological data, of which omics constitute a significant fraction, has been made publicly available in repositories over the past decades. These data come in various formats and cover a range of species and research areas providing insights into the complexities of biological systems; the public repositories hosting these data serve as multifaceted resources. The potentially greater value of these data lies in their secondary utilization as the deployment of data science and artificial intelligence in biology advances. Here, we critically evaluate challenges in secondary data use, focusing on omics data of human embryonic kidney cell lines available in public repositories. The emerging issues are obstacles faced by secondary data users across diverse domains as they concern platforms and repositories, which accept deposition of data irrespective of their species type. The evolving landscape of data-driven research in biology prompts re-evaluation of open access data curation and submission procedures to ensure that these challenges do not impede novel research opportunities through data exploitation. This paper aims to draw attention to widespread issues with data reporting and encourages data owners to meticulously curate submissions to maximize not only their immediate research impact but also the long-term legacy of datasets.
RESUMO
We present the Codon Statistics Database, an online database that contains codon usage statistics for all the species with reference or representative genomes in RefSeq (over 15,000). The user can search for any species and access two sets of tables. One set lists, for each codon, the frequency, the Relative Synonymous Codon Usage, and whether the codon is preferred. Another set of tables lists, for each gene, its GC content, Effective Number of Codons, Codon Adaptation Index, and frequency of optimal codons. Equivalent tables can be accessed for (1) all nuclear genes, (2) nuclear genes encoding ribosomal proteins, (3) mitochondrial genes, and (4) chloroplast genes (if available in the relevant assembly). The user can also search for any taxonomic group (e.g., "primates") and obtain a table comparing all the species in the group. The database is free to access without registration at http://codonstatsdb.unr.edu.
Assuntos
Uso do Códon , Magnoliopsida , Animais , Composição de Bases , Códon/genética , Genes de CloroplastosRESUMO
Selinexor, a covalent XPO1 inhibitor, is approved in the USA in combination with dexamethasone for penta-refractory multiple myeloma. Additional XPO1 covalent inhibitors are currently in clinical trials for multiple diseases including hematologic malignancies, solid tumor malignancies, glioblastoma multiforme (GBM), and amyotrophic lateral sclerosis (ALS). It is important to measure the target engagement and selectivity of covalent inhibitors to understand the degree of engagement needed for efficacy, while avoiding both mechanism-based and off-target toxicity. Herein, we report clickable probes based on the XPO1 inhibitors selinexor and eltanexor for the labeling of XPO1 in live cells to assess target engagement and selectivity. We used mass spectrometry-based chemoproteomic workflows to profile the proteome-wide selectivity of selinexor and eltanexor and show that they are highly selective for XPO1. Thermal profiling analysis of selinexor further offers an orthogonal approach to measure XPO1 engagement in live cells. We believe these probes and assays will serve as useful tools to further interrogate the biology of XPO1 and its inhibition in cellular and inâ vivo systems.
Assuntos
Esclerose Lateral Amiotrófica/tratamento farmacológico , Antineoplásicos/farmacologia , Glioblastoma/tratamento farmacológico , Hidrazinas/farmacologia , Carioferinas/antagonistas & inibidores , Receptores Citoplasmáticos e Nucleares/antagonistas & inibidores , Triazóis/farmacologia , Esclerose Lateral Amiotrófica/metabolismo , Esclerose Lateral Amiotrófica/patologia , Antineoplásicos/química , Linhagem Celular Tumoral , Glioblastoma/metabolismo , Glioblastoma/patologia , Humanos , Hidrazinas/química , Carioferinas/metabolismo , Receptores Citoplasmáticos e Nucleares/metabolismo , Triazóis/química , Proteína Exportina 1RESUMO
The different proteins of any proteome evolve at enormously different rates. One of the primary factors influencing rates of protein evolution is expression level, with highly expressed proteins tending to evolve at slow rates. This phenomenon, known as the expression level-evolutionary rate (E-R) anticorrelation, has been attributed to the abundance-dependent deleterious effects of misfolding or misinteraction. We have recently shown that secreted proteins either lack an E-R anticorrelation or exhibit a significantly reduced E-R anticorrelation. This effect may be due to the strict quality control to which secreted proteins are subject in the endoplasmic reticulum (which is expected to reduce the rate of misfolding and its deleterious effects) or to their extracellular location (expected to reduce the rate of misinteraction and its deleterious effects). Among secreted proteins, N-glycosylated ones are under particularly strong quality control. Here, we investigate how N-linked glycosylation affects the E-R anticorrelation. Strikingly, we observe a positive E-R correlation among N-glycosylated proteins. That is, N-glycoproteins that are highly expressed evolve at faster rates than lowly expressed N-glycoproteins, in contrast to what is observed among intracellular proteins.
Assuntos
Evolução Molecular , Expressão Gênica , Glicoproteínas/genéticaRESUMO
The different proteins of any proteome evolve at enormously different rates. What factors contribute to this variability, and to what extent, is still a largely open question. We hypothesized that disulfide bonds, by increasing protein stability, should make proteins' structures relatively independent of their amino acid sequences, thus acting as buffers of deleterious mutations and enabling accelerated sequence evolution. In agreement with this hypothesis, we observed that membrane proteins with disulfide bonds evolved 88% faster than those without disulfide bonds, and that extracellular proteins with disulfide bonds evolved 49% faster than those without disulfide bonds. In addition, genes encoding proteins with disulfide bonds exhibit an increased likelihood of showing signatures of positive selection. Multivariate analyses indicate that the trend is independent of a number of potentially confounding factors. The effect, however, is not observed among the longest proteins, which can become stabilized by mechanisms other than disulfide bonds.
Assuntos
Dissulfetos/metabolismo , Proteínas/química , Sequência de Aminoácidos , Bases de Dados de Proteínas , Dissulfetos/química , Evolução Molecular , Estabilidade Proteica , Proteínas/metabolismo , ProteomaRESUMO
The rates of evolution of the proteins of any organism vary across orders of magnitude. A primary factor influencing rates of protein evolution is expression. A strong negative correlation between expression levels and evolutionary rates (the so-called E-R anticorrelation) has been observed in virtually all studied organisms. This effect is currently attributed to the abundance-dependent fitness costs of misfolding and unspecific protein-protein interactions, among other factors. Secreted proteins are folded in the endoplasmic reticulum, a compartment where chaperones, folding catalysts, and stringent quality control mechanisms promote their correct folding and may reduce the fitness costs of misfolding. In addition, confinement of secreted proteins to the extracellular space may reduce misinteractions and their deleterious effects. We hypothesize that each of these factors (the secretory pathway quality control and extracellular location) may reduce the strength of the E-R anticorrelation. Indeed, here we show that among human proteins that are secreted to the extracellular space, rates of evolution do not correlate with protein abundances. This trend is robust to controlling for several potentially confounding factors and is also observed when analyzing protein abundance data for 6 human tissues. In addition, analysis of mRNA abundance data for 32 human tissues shows that the E-R correlation is always less negative, and sometimes nonsignificant, in secreted proteins. Similar observations were made in Caenorhabditis elegans and in Escherichia coli, and to a lesser extent in Drosophila melanogaster, Saccharomyces cerevisiae and Arabidopsis thaliana. Our observations contribute to understand the causes of the E-R anticorrelation.
Assuntos
Evolução Molecular , Modelos Genéticos , Proteínas/genética , Via Secretória/genética , Animais , Evolução Biológica , Caenorhabditis elegans/genética , Bases de Dados de Proteínas , Drosophila melanogaster/genética , Regulação da Expressão Gênica , Humanos , Dobramento de Proteína , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Saccharomyces cerevisiae/genética , Seleção GenéticaRESUMO
Comparison of the proteins of thermophilic, mesophilic, and psychrophilic prokaryotes has revealed several features characteristic to proteins adapted to high temperatures, which increase their thermostability. These characteristics include a profusion of disulfide bonds, salt bridges, hydrogen bonds, and hydrophobic interactions, and a depletion in intrinsically disordered regions. It is unclear, however, whether such differences can also be observed in eukaryotic proteins or when comparing proteins that are adapted to temperatures that are more subtly different. When an organism is exposed to high temperatures, a subset of its proteins is overexpressed (heat-induced proteins), whereas others are either repressed (heat-repressed proteins) or remain unaffected. Here, we determine the expression levels of all genes in the eukaryotic model system Arabidopsis thaliana at 22 and 37 °C, and compare both the amino acid compositions and levels of intrinsic disorder of heat-induced and heat-repressed proteins. We show that, compared to heat-repressed proteins, heat-induced proteins are enriched in electrostatically charged amino acids and depleted in polar amino acids, mirroring thermophile proteins. However, in contrast with thermophile proteins, heat-induced proteins are enriched in intrinsically disordered regions, and depleted in hydrophobic amino acids. Our results indicate that temperature adaptation at the level of amino acid composition and intrinsic disorder can be observed not only in proteins of thermophilic organisms, but also in eukaryotic heat-induced proteins; the underlying adaptation pathways, however, are similar but not the same.
Assuntos
Aminoácidos/química , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Adaptação Fisiológica/fisiologia , Temperatura Alta , Eletricidade Estática , TemperaturaRESUMO
Unlike the pandemic form of HIV-1 (group M), group O viruses are endemic in west central Africa, especially in Cameroon. However, little is known about group O's genetic evolution, and why this highly divergent lineage has not become pandemic. Using a unique and large set of group O sequences from samples collected from 1987 to 2012, we find that this lineage has evolved in successive slow and fast phases of diversification, with a most recent common ancestor estimated to have existed around 1930 (1914-1944). The most rapid periods of diversification occurred in the 1950s and in the 1980s, and could be linked to favourable epidemiological contexts in Cameroon. Group O genetic diversity reflects this two-phase evolution, with two distinct populations potentially having different viral properties. The currently predominant viral population emerged in the 1980s, from an ancient population which had first developed in the 1950s, and is characterized by higher growth and evolutionary rates, and the natural presence of the Y181C resistance mutation, thought to confer a phenotypic advantage. Our findings show that although this evolutionary pattern is specific to HIV-1 group O, it paralleled the early spread of HIV-1 group M in the Democratic Republic of Congo. Both viral lineages are likely to have benefited from similar epidemiological contexts. The relative role of virological and social factors in the distinct epidemic histories of HIV-1 group O and M needs to be reassessed.
Assuntos
Evolução Molecular , Infecções por HIV/epidemiologia , Infecções por HIV/genética , HIV-1/genética , Camarões/epidemiologia , Humanos , Filogenia , Reação em Cadeia da PolimeraseRESUMO
UNLABELLED: Entry inhibitors represent a potent class of antiretroviral drugs that target a host cell protein, CCR5, an HIV-1 entry coreceptor, and not viral protein. Lack of sensitivity can occur due to preexisting virus that uses the CXCR4 coreceptor, while true resistance occurs through viral adaptation to use a drug-bound CCR5 coreceptor. To understand this R5 resistance pathway, we analyzed >500 envelope protein sequences and phenotypes from viruses of 20 patients from the clinical trials MOTIVATE 1 and 2, in which treatment-experienced patients received maraviroc plus optimized background therapy. The resistant viral population was phylogenetically distinct and associated with a genetic bottleneck in each patient, consistent with de novo emergence of resistance. Recombination analysis showed that the C2-V3-C3 region tends to genotypically correspond to the recombinant's phenotype, indicating its primary importance in conferring resistance. Between patients, there was a notable lack of commonality in the specific sites conferring resistance, confirming the unusual nature of R5-tropic resistance. We used coevolutionary and positive-selection analyses to characterize the genotypic determinants of resistance and found that (i) there are complicated covariation networks, indicating frequent coevolutionary/compensatory changes in the context of protein structure; (ii) covarying sites under positive selection are enriched in resistant viruses; (iii) CD4 binding sites form part of a unique covariation network independent of the V3 loop; and (iv) the covariation network formed between the V3 loop and other regions of gp120 and gp41 intersects sites involved in glycosylation and protein secretion. These results demonstrate that while envelope sequence mutations are the key to conferring maraviroc resistance, the specific changes involved are context dependent and thus inherently unpredictable. IMPORTANCE: The entry inhibitor drug maraviroc makes the cell coreceptor CCR5 unavailable for use by HIV-1 and is now used in combination antiretroviral therapy. Treatment failure with drug-resistant virus is particularly interesting because it tends to be rare, with lack of sensitivity usually associated with the presence of CXCR4-using virus (CXCR4 is the main alternative coreceptor HIV-1 uses, in addition to CD4). We analyzed envelope sequences from HIV-1, obtained from 20 patients who enrolled in maraviroc clinical trials and experienced treatment failure, without detection of CXCR4-using virus. Evolutionary analysis was employed to identify molecular changes that confer maraviroc resistance. We found that in these individuals, resistant viruses form a distinct population that evolved once and was successful as a result of drug pressure. Further evolutionary analysis placed the complex network of interdependent mutational changes into functional groups that help explain the impediments to the emergence of maraviroc-associated R5 drug resistance.
Assuntos
Antagonistas dos Receptores CCR5/uso terapêutico , Cicloexanos/uso terapêutico , Farmacorresistência Viral/genética , Inibidores da Fusão de HIV/uso terapêutico , Infecções por HIV/tratamento farmacológico , HIV-1/efeitos dos fármacos , Receptores CCR5/metabolismo , Triazóis/uso terapêutico , Sequência de Aminoácidos , Sequência de Bases , Ensaios Clínicos como Assunto , Glicosilação , Proteína gp120 do Envelope de HIV/genética , Proteína gp41 do Envelope de HIV/genética , Infecções por HIV/virologia , HIV-1/metabolismo , Humanos , Maraviroc , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Receptores CXCR4/metabolismo , Alinhamento de Sequência , Análise de Sequência de RNA , Transdução de Sinais/genética , Falha de Tratamento , Internalização do Vírus/efeitos dos fármacos , Replicação Viral/genéticaRESUMO
UNLABELLED: African green monkeys (AGMs) are naturally infected with simian immunodeficiency virus (SIV) at high prevalence levels and do not progress to AIDS. Sexual transmission is the main transmission route in AGM, while mother-to-infant transmission (MTIT) is negligible. We investigated SIV transmission in wild AGMs to assess whether or not high SIV prevalence is due to differences in mucosal permissivity to SIV (i.e., whether the genetic bottleneck of viral transmission reported in humans and macaques is also observed in AGMs in the wild). We tested 121 sabaeus AGMs (Chlorocebus sabaeus) from the Gambia and found that 53 were SIV infected (44%). By combining serology and viral load quantitation, we identified 4 acutely infected AGMs, in which we assessed the diversity of the quasispecies by single-genome amplification (SGA) and documented that a single virus variant established the infections. We thus show that natural SIV transmission in the wild is associated with a genetic bottleneck similar to that described for mucosal human immunodeficiency virus (HIV) transmission in humans. Flow cytometry assessment of the immune cell populations did not identify major differences between infected and uninfected AGM. The expression of the SIV coreceptor CCR5 on CD4+ T cells dramatically increased in adults, being higher in infected than in uninfected infant and juvenile AGMs. Thus, the limited SIV MTIT in natural hosts appears to be due to low target cell availability in newborns and infants, which supports HIV MTIT prevention strategies aimed at limiting the target cells at mucosal sites. Combined, (i) the extremely high prevalence in sexually active AGMs, (ii) the very efficient SIV transmission in the wild, and (iii) the existence of a fraction of multiparous females that remain uninfected in spite of massive exposure to SIV identify wild AGMs as an acceptable model of exposed, uninfected individuals. IMPORTANCE: We report an extensive analysis of the natural history of SIVagm infection in its sabaeus monkey host, the African green monkey species endemic to West Africa. Virtually no study has investigated the natural history of SIV infection in the wild. The novelty of our approach is that we report for the first time that SIV infection has no discernible impact on the major immune cell populations in natural hosts, thus confirming the nonpathogenic nature of SIV infection in the wild. We also focused on the correlates of SIV transmission, and we report, also for the first time, that SIV transmission in the wild is characterized by a major genetic bottleneck, similar to that described for HIV-1 transmission in humans. Finally, we report here that the restriction of target cell availability is a major correlate of the lack of SIV transmission to the offspring in natural hosts of SIVs.
Assuntos
Infecções por Lentivirus/veterinária , Doenças dos Macacos/transmissão , Doenças dos Macacos/virologia , Vírus da Imunodeficiência Símia/isolamento & purificação , Animais , Chlorocebus aethiops , Análise por Conglomerados , Feminino , Citometria de Fluxo , Gâmbia , Genótipo , Infecções por Lentivirus/imunologia , Infecções por Lentivirus/transmissão , Infecções por Lentivirus/virologia , Subpopulações de Linfócitos/imunologia , Masculino , Dados de Sequência Molecular , Filogenia , Análise de Sequência de DNA , Vírus da Imunodeficiência Símia/classificação , Vírus da Imunodeficiência Símia/genéticaRESUMO
Pathogenesis studies of SIV infection have not been performed to date in wild monkeys due to difficulty in collecting and storing samples on site and the lack of analytical reagents covering the extensive SIV diversity. We performed a large scale study of molecular epidemiology and natural history of SIVagm infection in 225 free-ranging AGMs from multiple locations in South Africa. SIV prevalence (established by sequencing pol, env, and gag) varied dramatically between infant/juvenile (7%) and adult animals (68%) (p<0.0001), and between adult females (78%) and males (57%). Phylogenetic analyses revealed an extensive genetic diversity, including frequent recombination events. Some AGMs harbored epidemiologically linked viruses. Viruses infecting AGMs in the Free State, which are separated from those on the coastal side by the Drakensberg Mountains, formed a separate cluster in the phylogenetic trees; this observation supports a long standing presence of SIV in AGMs, at least from the time of their speciation to their Plio-Pleistocene migration. Specific primers/probes were synthesized based on the pol sequence data and viral loads (VLs) were quantified. VLs were of 10(4)-10(6) RNA copies/ml, in the range of those observed in experimentally-infected monkeys, validating the experimental approaches in natural hosts. VLs were significantly higher (10(7)-10(8) RNA copies/ml) in 10 AGMs diagnosed as acutely infected based on SIV seronegativity (Fiebig II), which suggests a very active transmission of SIVagm in the wild. Neither cytokine levels (as biomarkers of immune activation) nor sCD14 levels (a biomarker of microbial translocation) were different between SIV-infected and SIV-uninfected monkeys. This complex algorithm combining sequencing and phylogeny, VL quantification, serology, and testing of surrogate markers of microbial translocation and immune activation permits a systematic investigation of the epidemiology, viral diversity and natural history of SIV infection in wild African natural hosts.
Assuntos
Chlorocebus aethiops , Evolução Molecular , Variação Genética , Síndrome de Imunodeficiência Adquirida dos Símios/epidemiologia , Vírus da Imunodeficiência Símia/isolamento & purificação , Animais , Sequência de Bases , Feminino , Interações Hospedeiro-Patógeno , Masculino , Dados de Sequência Molecular , Taxa de Mutação , Recombinação Genética , Sequências Repetitivas de Ácido Nucleico , Síndrome de Imunodeficiência Adquirida dos Símios/virologia , Vírus da Imunodeficiência Símia/genética , África do Sul/epidemiologiaRESUMO
With 29 individual antiretroviral drugs available from six classes that are approved for the treatment of HIV-1 infection, a combination of different phenotypic and genotypic tests is currently needed to monitor HIV-infected individuals. In this study, we developed a novel HIV-1 genotypic assay based on deep sequencing (DeepGen HIV) to simultaneously assess HIV-1 susceptibilities to all drugs targeting the three viral enzymes and to predict HIV-1 coreceptor tropism. Patient-derived gag-p2/NCp7/p1/p6/pol-PR/RT/IN- and env-C2V3 PCR products were sequenced using the Ion Torrent Personal Genome Machine. Reads spanning the 3' end of the Gag, protease (PR), reverse transcriptase (RT), integrase (IN), and V3 regions were extracted, truncated, translated, and assembled for genotype and HIV-1 coreceptor tropism determination. DeepGen HIV consistently detected both minority drug-resistant viruses and non-R5 HIV-1 variants from clinical specimens with viral loads of ≥1,000 copies/ml and from B and non-B subtypes. Additional mutations associated with resistance to PR, RT, and IN inhibitors, previously undetected by standard (Sanger) population sequencing, were reliably identified at frequencies as low as 1%. DeepGen HIV results correlated with phenotypic (original Trofile, 92%; enhanced-sensitivity Trofile assay [ESTA], 80%; TROCAI, 81%; and VeriTrop, 80%) and genotypic (population sequencing/Geno2Pheno with a 10% false-positive rate [FPR], 84%) HIV-1 tropism test results. DeepGen HIV (83%) and Trofile (85%) showed similar concordances with the clinical response following an 8-day course of maraviroc monotherapy (MCT). In summary, this novel all-inclusive HIV-1 genotypic and coreceptor tropism assay, based on deep sequencing of the PR, RT, IN, and V3 regions, permits simultaneous multiplex detection of low-level drug-resistant and/or non-R5 viruses in up to 96 clinical samples. This comprehensive test, the first of its class, will be instrumental in the development of new antiretroviral drugs and, more importantly, will aid in the treatment and management of HIV-infected individuals.
Assuntos
HIV-1/enzimologia , Integrases/metabolismo , Fármacos Anti-HIV/farmacologia , Genótipo , HIV-1/efeitos dos fármacos , HIV-1/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Integrases/genética , Reação em Cadeia da Polimerase , DNA Polimerase Dirigida por RNA/genética , DNA Polimerase Dirigida por RNA/metabolismo , Receptores de HIV/química , Receptores de HIV/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase ReversaRESUMO
As the dimensionality, throughput and complexity of cytometry data increases, so does the demand for user-friendly, interactive analysis tools that leverage high-performance machine learning frameworks. Here we introduce FlowAtlas: an interactive web application that enables dimensionality reduction of cytometry data without down-sampling and that is compatible with datasets stained with non-identical panels. FlowAtlas bridges the user-friendly environment of FlowJo and computational tools in Julia developed by the scientific machine learning community, eliminating the need for coding and bioinformatics expertise. New population discovery and detection of rare populations in FlowAtlas is intuitive and rapid. We demonstrate the capabilities of FlowAtlas using a human multi-tissue, multi-donor immune cell dataset, highlighting key immunological findings. FlowAtlas is available at https://github.com/gszep/FlowAtlas.jl.git.
Assuntos
Biologia Computacional , Citometria de Fluxo , Imunofenotipagem , Software , Humanos , Imunofenotipagem/métodos , Citometria de Fluxo/métodos , Biologia Computacional/métodos , Aprendizado de MáquinaRESUMO
Pathogens differ in their host specificities, with species infecting a unique host (specialist pathogens) and others having a wide host range (generalists). Molecular determinants of pathogen's host range remain poorly understood. Secreted proteins of generalist pathogens are expected to have a broader range of intermolecular interactions (i.e., higher promiscuity) compared with their specialist counterparts. We hypothesize that this increased promiscuity of generalist secretomes may be based on an elevated content of primitive amino acids and intrinsically disordered regions, as these features are known to increase protein flexibility and interactivity. Here, we measure the proportion of primitive amino acids and percentage of intrinsically disordered residues in secreted, membrane, and cytoplasmic proteins from pathogens with different host specificity. Supporting our prediction, there is a significant general enrichment for primitive amino acids and intrinsically disordered regions in proteins from generalists compared to specialists, particularly among secreted proteins in prokaryotes. Our findings support our hypothesis that secreted proteins' amino acid composition and disordered content influence the pathogens' host range.
RESUMO
DNA methylation is mediated by a conserved family of DNA methyltransferases (Dnmts). The human genome encodes three active Dnmts (Dnmt1, Dnmt3a and Dnmt3b), the tRNA methyltransferase Dnmt2, and the regulatory protein Dnmt3L. Despite their high degree of conservation among different species, genes encoding Dnmts have been duplicated and/or lost in multiple lineages throughout evolution, indicating that the DNA methylation machinery has some potential to undergo evolutionary change. However, little is known about the extent to which this machinery, or the methylome, varies among vertebrates. Here, we study the molecular evolution of Dnmt1, the enzyme responsible for maintenance of DNA methylation patterns after replication, in 79 vertebrate species. Our analyses show that all studied species exhibit a single copy of the DNMT1 gene, with the exception of tilapia and marsupials (tammar wallaby, koala, Tasmanian devil and opossum), each of which displays two apparently functional DNMT1 copies. Our phylogenetic analyses indicate that DNMT1 duplicated before the radiation of major marsupial groups (i.e., at least ~75 million years ago), thus giving rise to two DNMT1 copies in marsupials (copy 1 and copy 2). In the opossum lineage, copy 2 was lost, and copy 1 recently duplicated again, generating three DNMT1 copies: two putatively functional genes (copy 1a and 1b) and one pseudogene (copy 1ψ). Both marsupial copies (DNMT1 copies 1 and 2) are under purifying selection, and copy 2 exhibits elevated rates of evolution and signatures of positive selection, suggesting a scenario of neofunctionalization. This gene duplication might have resulted in modifications in marsupial methylomes and their dynamics.
Assuntos
DNA (Citosina-5-)-Metiltransferase 1/genética , Evolução Molecular , Vertebrados/genética , Animais , DNA (Citosina-5-)-Metiltransferase 1/química , DNA (Citosina-5-)-Metiltransferase 1/metabolismo , Metilação de DNA , Duplicação Gênica , Humanos , Marsupiais/genética , Gambás/genética , Filogenia , Domínios Proteicos/genética , Seleção GenéticaRESUMO
The enzyme CMP-N-acetylneuraminic acid hydroxylase (CMAH) is responsible for the synthesis of N-glycolylneuraminic acid (Neu5Gc), a sialic acid present on the cell surface proteins of most deuterostomes. The CMAH gene is thought to be present in most deuterostomes, but it has been inactivated in a number of lineages, including humans. The inability of humans to synthesize Neu5Gc has had several evolutionary and biomedical implications. Remarkably, Neu5Gc is a xenoantigen for humans, and consumption of Neu5Gc-containing foods, such as red meats, may promote inflammation, arthritis, and cancer. Likewise, xenotransplantation of organs producing Neu5Gc can result in inflammation and organ rejection. Therefore, knowing what animal species contain a functional CMAH gene, and are thus capable of endogenous Neu5Gc synthesis, has potentially far-reaching implications. In addition to humans, other lineages are known, or suspected, to have lost CMAH; however, to date reports of absent and pseudogenic CMAH genes are restricted to a handful of species. Here, we analyze all available genomic data for nondeuterostomes, and 322 deuterostome genomes, to ascertain the phylogenetic distribution of CMAH. Among nondeuterostomes, we found CMAH homologs in two green algae and a few prokaryotes. Within deuterostomes, putatively functional CMAH homologs are present in 184 of the studied genomes, and a total of 31 independent gene losses/pseudogenization events were inferred. Our work produces a list of animals inferred to be free from endogenous Neu5Gc based on the absence of CMAH homologs and are thus potential candidates for human consumption, xenotransplantation research, and model organisms for investigation of human diseases.
Assuntos
Oxigenases de Função Mista/genética , Ácidos Neuramínicos/metabolismo , Filogenia , Animais , Vias Biossintéticas , Humanos , Oxigenases de Função Mista/metabolismo , Anotação de Sequência Molecular , PseudogenesRESUMO
OBJECTIVES: HIV/1 group P (HIV-1/P) is the last HIV/1 group discovered and, to date, constitutes only two strains. To obtain new insight into this divergent group, we screened for new infections by developing specific tools, and analysed phenotypic and genotypic properties of the prototypic strain RBF168. In addition, the follow-up of the unique infected patient monitored so far has raised the knowledge of the natural history of this infection and its therapeutic management. DESIGN/METHODS: We developed an HIV-1/P specific seromolecular strategy and screened over 29â498 specimen samples. Infectivity and evolution of the gag-30 position, considered as marker of adaptation to human, were explored by successive passages of RBF168 strain onto human peripheral blood mononuclear cells. Natural history and immunovirological responses to combined antiretroviral therapy (cART) were analysed based on CD4+ cells and plasmatic viral load evolution. RESULTS: No new infection was detected. Infectivity of RBF168 was found lower, relative to other main HIV groups and the conservative methionine found in the gag-30 position revealed a lack of adaptation to human. The follow-up of the patient during the 5-year ART-free period, showed a relative stability of CD4+ cell count with a mean of 326 cells/µl. Initiation of cART led to rapid RNA undetectability with a significant increase of CD4+ cells, reaching 687 cells/µl after 8 years. CONCLUSION: Our results showed that HIV-1/P strains remain extremely rare and could be less adapted and pathogenic than other HIV strains. These data lead to the hypothesis that HIV-1/P infection could evolve towards, or even already corresponds to, a dead-end infection.
Assuntos
Genótipo , Infecções por HIV/virologia , HIV-1/classificação , HIV-1/genética , Adaptação Biológica , Sangue/virologia , Contagem de Linfócito CD4 , Células Cultivadas , Seguimentos , Técnicas de Genotipagem , Infecções por HIV/tratamento farmacológico , Infecções por HIV/patologia , HIV-1/isolamento & purificação , HIV-1/patogenicidade , Humanos , Leucócitos Mononucleares/virologia , Mutação de Sentido Incorreto , Estudos Prospectivos , Sorotipagem , Carga Viral , Virulência , Cultura de Vírus , Produtos do Gene gag do Vírus da Imunodeficiência Humana/genéticaRESUMO
The proteins of any organism evolve at disparate rates. A long list of factors affecting rates of protein evolution have been identified. However, the relative importance of each factor in determining rates of protein evolution remains unresolved. The prevailing view is that evolutionary rates are dominantly determined by gene expression, and that other factors such as network centrality have only a marginal effect, if any. However, this view is largely based on analyses in yeasts, and accurately measuring the importance of the determinants of rates of protein evolution is complicated by the fact that the different factors are often correlated with each other, and by the relatively poor quality of available functional genomics data sets. Here, we use correlation, partial correlation and principal component regression analyses to measure the contributions of several factors to the variability of the rates of evolution of human proteins. For this purpose, we analyzed the entire human protein-protein interaction data set and the human signal transduction network-a network data set of exceptionally high quality, obtained by manual curation, which is expected to be virtually free from false positives. In contrast with the prevailing view, we observe that network centrality (measured as the number of physical and nonphysical interactions, betweenness, and closeness) has a considerable impact on rates of protein evolution. Surprisingly, the impact of centrality on rates of protein evolution seems to be comparable, or even superior according to some analyses, to that of gene expression. Our observations seem to be independent of potentially confounding factors and from the limitations (biases and errors) of interactomic data sets.
Assuntos
Mapas de Interação de Proteínas , Proteínas/metabolismo , Animais , Caenorhabditis elegans/genética , Caenorhabditis elegans/metabolismo , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Evolução Molecular , Humanos , Ligação Proteica , Proteínas/genética , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismoRESUMO
Whereas the rate of gene duplication is relatively high, only certain duplications survive the filter of natural selection and can contribute to genome evolution. However, the reasons why certain genes can be retained after duplication whereas others cannot remain largely unknown. Many proteins contain intrinsically disordered regions (IDRs), whose structures fluctuate between alternative conformational states. Due to their high flexibility, IDRs often enable protein-protein interactions and are the target of post-translational modifications. Intrinsically disordered proteins (IDPs) have characteristics that might either stimulate or hamper the retention of their encoding genes after duplication. On the one hand, IDRs may enable functional diversification, thus promoting duplicate retention. On the other hand, increased IDP availability is expected to result in deleterious unspecific interactions. Here, we interrogate the proteomes of human, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Arabidopsis thaliana and Escherichia coli, in order to ascertain the impact of protein intrinsic disorder on gene duplicability. We show that, in general, proteins encoded by duplicated genes tend to be less disordered than those encoded by singletons. The only exception is proteins encoded by ohnologs, which tend to be more disordered than those encoded by singletons or genes resulting from small-scale duplications. Our results indicate that duplication of genes encoding IDPs outside the context of whole-genome duplication (WGD) is often deleterious, but that IDRs facilitate retention of duplicates in the context of WGD. We discuss the potential evolutionary implications of our results.
Assuntos
Eucariotos/genética , Evolução Molecular , Genes Duplicados , Genoma , Dobramento de Proteína , Animais , Escherichia coli/genética , Escherichia coli/metabolismo , Eucariotos/metabolismo , Humanos , Ploidias , Processamento de Proteína Pós-Traducional , ProteômicaRESUMO
Human immunodeficiency virus type 1 (HIV-1) envelope gp120 is partly an intrinsically disordered (unstructured/disordered) protein as it contains regions that do not fold into well-defined protein structures. These disordered regions play important roles in HIV's life cycle, particularly, V3 loop-dependent cell entry, which determines how the virus uses two coreceptors on immune cells, the chemokine receptors CCR5 (R5), CXCR4 (X4) or both (R5X4 virus). Most infecting HIV-1 variants utilise CCR5, while a switch to CXCR4-use occurs in the majority of infections. Why does this 'rewiring' event occur in HIV-1 infected patients? As changes in the charge of the V3 loop are associated with this receptor switch and it has been suggested that charged residues promote structure disorder, we hypothesise that the intrinsic disorder of the V3 loop is permissive to sequence variation thus contributing to the switch in cell tropism. To test this we use three independent data sets of gp120 to analyse V3 loop disorder. We find that the V3 loop of X4 virus has significantly higher intrinsic disorder tendency than R5 and R5X4 virus, while R5X4 virus has the lowest. These results indicate that structural disorder plays an important role in HIV-1 cell tropism and CXCR4 binding. We discuss the potential evolutionary mechanisms leading to the fixation of disorder promoting mutations and the adaptive potential of protein structural disorder in viral host adaptation.