RESUMO
Hepatocellular carcinoma (HCC) is the third leading cause of cancer-associated deaths worldwide. Treatment with immune checkpoint antibodies has shown promise in advanced HCC, but the response is only 15-20%. We discovered a potential target for the treatment of HCC, the cholecystokinin-B receptor (CCK-BR). This receptor is overexpressed in murine and human HCC and not in normal liver tissue. Mice bearing syngeneic RIL-175 HCC tumors were treated with phosphate buffer saline (PBS; control), proglumide (a CCK-receptor antagonist), an antibody to programmed cell death protein 1 (PD-1Ab), or the combination of proglumide and the PD-1Ab. In vitro, RNA was extracted from untreated or proglumide-treated murine Dt81Hepa1-6 HCC cells and analyzed for expression of fibrosis-associated genes. RNA was also extracted from human HepG2 HCC cells or HepG2 cells treated with proglumide and subjected to RNA sequencing. Results showed that proglumide decreased fibrosis in the tumor microenvironment and increased the number of intratumoral CD8+ T cells in RIL-175 tumors. When proglumide was given in combination with the PD-1Ab, there was a further significant increase in intratumoral CD8+ T cells, improved survival, and alterations in genes regulating tumoral fibrosis and epithelial-to-mesenchymal transition. RNAseq results from human HepG2 HCC cells treated with proglumide showed significant changes in differentially expressed genes involved in tumorigenesis, fibrosis, and the tumor microenvironment. The use of the CCK receptor antagonist may improve efficacy of immune checkpoint antibodies and survival in those with advanced HCC.
Assuntos
Carcinoma Hepatocelular , Inibidores de Checkpoint Imunológico , Neoplasias Hepáticas , Proglumida , Receptores da Colecistocinina , Animais , Camundongos , Carcinoma Hepatocelular/imunologia , Carcinoma Hepatocelular/metabolismo , Colecistocinina , Fibrose , Neoplasias Hepáticas/imunologia , Neoplasias Hepáticas/metabolismo , Proglumida/farmacologia , Receptores da Colecistocinina/antagonistas & inibidores , Inibidores de Checkpoint Imunológico/imunologiaRESUMO
Interest in the mechanisms of DNA repair pathways, including the base excision repair (BER) pathway specifically, has heightened since these pathways have been shown to modulate important aspects of human disease. Modulation of the expression or activity of a particular BER enzyme, N-methylpurine DNA glycosylase (MPG), has been demonstrated to play a role in carcinogenesis and resistance to chemotherapy as well as neurodegenerative diseases, which has intensified the focus on studying MPG-related mechanisms of repair. A specific small molecule inhibitor for MPG activity would be a valuable biochemical tool for understanding these repair mechanisms. By screening several small molecule chemical libraries, we identified a natural polyphenolic compound, morin hydrate, which inhibits MPG activity specifically (IC50=2.6µM). Detailed mechanism analysis showed that morin hydrate inhibited substrate DNA binding of MPG, and eventually the enzymatic activity of MPG. Computational docking studies with an x-ray derived MPG structure as well as comparison studies with other structurally-related flavonoids offer a rationale for the inhibitory activity of morin hydrate observed. The results of this study suggest that the morin hydrate could be an effective tool for studying MPG function and it is possible that morin hydrate and its derivatives could be utilized in future studies focused on the role of MPG in human disease.
Assuntos
DNA Glicosilases/antagonistas & inibidores , Inibidores Enzimáticos/química , Inibidores Enzimáticos/farmacologia , Flavonoides/farmacologia , Linhagem Celular Tumoral , Reparo do DNA , Avaliação Pré-Clínica de Medicamentos , Flavonoides/química , Humanos , Modelos Moleculares , Relação Estrutura-AtividadeRESUMO
DnaA oligomerizes when bound to origins of chromosomal replication. Structural analysis of a truncated form of DnaA from Aquifex aeolicus has provided insight into crucial conformational differences within the AAA+ domain that are specific to the ATP- versus ADP- bound form of DnaA. In this study molecular docking of ATP and ADP onto Escherichia coli DnaA, modeled on the crystal structure of Aquifex aeolicus DnaA, reveals changes in the orientation of amino acid residues within or near the vicinity of the nucleotide-binding pocket. Upon limited proteolysis with trypsin or chymotrypsin ADP-DnaA, but not ATP-DnaA generated relatively stable proteolytic fragments of various sizes. Examined sites of limited protease susceptibility that differ between ATP-DnaA and ADP-DnaA largely reside in the amino terminal half of DnaA. The concentration of adenine nucleotide needed to induce conformational changes, as detected by these protease susceptibilities of DnaA, coincides with the conversion of an inactive bacterial origin recognition complex (bORC) to a replication efficient pre-replication complex (pre-RC) at the E. coli chromosomal origin of replication (oriC).
Assuntos
Proteínas de Bactérias/química , Cromossomos Bacterianos , Proteínas de Ligação a DNA/química , Escherichia coli/enzimologia , Nucleotídeos/química , Complexo de Reconhecimento de Origem , Conformação Proteica , Origem de Replicação , Difosfato de Adenosina/química , Difosfato de Adenosina/metabolismo , Trifosfato de Adenosina/química , Trifosfato de Adenosina/metabolismo , Proteínas de Bactérias/metabolismo , Sítios de Ligação , Configuração de Carboidratos , Replicação do DNA , Proteínas de Ligação a DNA/metabolismo , Escherichia coli/genética , Modelos Moleculares , Nucleoproteínas/metabolismo , Nucleotídeos/metabolismo , Complexo de Reconhecimento de Origem/metabolismo , Ligação Proteica , Domínios e Motivos de Interação entre Proteínas , ProteóliseRESUMO
Gestational diabetes mellitus (GDM) is a common metabolic disorder affecting approximately 16.5% of pregnancies worldwide and causing significant health concerns. GDM is a serious pregnancy complication caused by chronic insulin resistance in the mother and has been associated with the development of neurodevelopmental disorders in offspring. Emerging data support the notion that GDM affects both the maternal and fetal microbiome, altering the composition and function of the gut microbiota, resulting in dysbiosis. The observed dysregulation of microbial presence in GDM pregnancies has been connected to fetal neurodevelopmental problems. Several reviews have focused on the intricate development of maternal dysbiosis affecting the fetal microbiome. Omics data have been instrumental in deciphering the underlying relationship among GDM, gut dysbiosis, and fetal neurodevelopment, paving the way for precision medicine. Microbiome-associated omics analyses help elucidate how dysbiosis contributes to metabolic disturbances and inflammation, linking microbial changes to adverse pregnancy outcomes such as those seen in GDM. Integrating omics data across these different layers-genomics, transcriptomics, proteomics, metabolomics, and microbiomics-offers a comprehensive view of the molecular landscape underlying GDM. This review outlines the affected pathways and proposes future developments and possible personalized therapeutic interventions by integrating omics data on the maternal microbiome, genetics, lifestyle factors, and other relevant biomarkers aimed at identifying women at high risk of developing GDM. For example, machine learning tools have emerged with powerful capabilities to extract meaningful insights from large datasets.
RESUMO
BACKGROUND: The post-genomic era poses several challenges. The biggest is the identification of biochemical function for protein sequences and structures resulting from genomic initiatives. Most sequences lack a characterized function and are annotated as hypothetical or uncharacterized. While homology-based methods are useful, and work well for sequences with sequence identities above 50%, they fail for sequences in the twilight zone (<30%) of sequence identity. For cases where sequence methods fail, structural approaches are often used, based on the premise that structure preserves function for longer evolutionary time-frames than sequence alone. It is now clear that no single method can be used successfully for functional inference. Given the growing need for functional assignments, we describe here a systematic new approach, designated ligand-centric, which is primarily based on analysis of ligand-bound/unbound structures in the PDB. Results of applying our approach to S-adenosyl-L-methionine (SAM) binding proteins are presented. RESULTS: Our analysis included 1,224 structures that belong to 172 unique families of the Protein Information Resource Superfamily system. Our ligand-centric approach was divided into four levels: residue, protein/domain, ligand, and family levels. The residue level included the identification of conserved binding site residues based on structure-guided sequence alignments of representative members of a family, and the identification of conserved structural motifs. The protein/domain level included structural classification of proteins, Pfam domains, domain architectures, and protein topologies. The ligand level included ligand conformations, ribose sugar puckering, and the identification of conserved ligand-atom interactions. The family level included phylogenetic analysis. CONCLUSION: We found that SAM bound to a total of 18 different fold types (I-XVIII). We identified 4 new fold types and 11 additional topological arrangements of strands within the well-studied Rossmann fold Methyltransferases (MTases). This extends the existing structural classification of SAM binding proteins. A striking correlation between fold type and the conformation of the bound SAM (classified as types) was found across the 18 fold types. Several site-specific rules were created for the assignment of functional residues to families and proteins that do not have a bound SAM or a solved structure.
Assuntos
Ligantes , Proteínas/metabolismo , S-Adenosilmetionina/metabolismo , Motivos de Aminoácidos , Sítios de Ligação , Bases de Dados de Proteínas , Metiltransferases/química , Metiltransferases/metabolismo , Ligação Proteica , Dobramento de Proteína , Estrutura Terciária de Proteína , Proteínas/química , S-Adenosilmetionina/química , TemperaturaRESUMO
Non-structural protein 1 (Nsp1) is a virulence factor found in all beta coronaviruses (b-CoVs). Recent studies have shown that Nsp1 of SARS-CoV-2 virus interacts with the nuclear export receptor complex, which includes nuclear RNA export factor 1 (NXF1) and nuclear transport factor 2-like export factor 1 (NXT1). The NXF1-NXT1 complex plays a crucial role in the transport of host messenger RNA (mRNA). Nsp1 interferes with the proper binding of NXF1 to mRNA export adaptors and its docking to the nuclear pore complex. We propose that drugs targeting the binding surface between Nsp1 and NXF1-NXT1 may be a useful strategy to restore host antiviral gene expression. Exploring this strategy forms the main goals of this paper. Crystal structures of Nsp1 and the heterodimer of NXF1-NXT1 have been determined. We modeled the docking of Nsp1 to the NXF1-NXT1 complex, and discovered repurposed drugs that may interfere with this binding. To our knowledge, this is the first attempt at drug-repurposing of this complex. We used structural analysis to screen 1993 FDA-approved drugs for docking to the NXF1-NXT1 complex. The top hit was ganirelix, with a docking score of -14.49. Ganirelix competitively antagonizes the gonadotropin releasing hormone receptor (GNRHR) on pituitary gonadotrophs, and induces rapid, reversible suppression of gonadotropin secretion. The conformations of Nsp1 and GNRHR make it unlikely that they interact with each other. Additional drug leads were inferred from the structural analysis of this complex, which are discussed in the paper. These drugs offer several options for therapeutically blocking Nsp1 binding to NFX1-NXT1, which may normalize nuclear export in COVID-19 infection.
RESUMO
The HEK-293 cell line was created in 1977 by transformation of primary human embryonic kidney cells with sheared adenovirus type 5 DNA. A previous study determined that the HEK-293 cells have neuronal markers rather than kidney markers. In this study, we tested the hypothesis whether Zika virus (ZIKV), a neurotropic virus, is able to infect and replicate in the HEK-293 cells. We show that the HEK-293 cells infected with ZIKV support viral replication as shown by indirect immunofluorescence (IFA) and quantitative reverse transcriptase-PCR (qRT-PCR). We performed RNA-seq analysis on the ZIKV-infected and the control uninfected HEK-293 cells and find 659 genes that are differentially transcribed in ZIKV-infected HEK-293 cells as compared to uninfected cells. The results show that the top 10 differentially transcribed and upregulated genes are involved in antiviral and inflammatory responses. Seven upregulated genes, IFNL1, DDX58, CXCL10, ISG15, KCNJ15, IFNIH1, and IFIT2, were validated by qRT-PCR. Altogether, our findings show that ZIKV infection alters host gene expression by affecting their antiviral and inflammatory responses.
Assuntos
Regulação da Expressão Gênica , Inflamação/virologia , Infecção por Zika virus/metabolismo , Infecção por Zika virus/virologia , Zika virus/metabolismo , Proteínas Reguladoras de Apoptose/metabolismo , Quimiocina CXCL10/metabolismo , Citocinas/metabolismo , Proteína DEAD-box 58/metabolismo , Técnica Indireta de Fluorescência para Anticorpo/métodos , Células HEK293 , Interações entre Hospedeiro e Microrganismos , Humanos , Helicase IFIH1 Induzida por Interferon/metabolismo , Interferons/metabolismo , Interleucinas/metabolismo , Canais de Potássio Corretores do Fluxo de Internalização/metabolismo , Proteínas de Ligação a RNA/metabolismo , RNA-Seq , Receptores Imunológicos/metabolismo , Ubiquitinas/metabolismo , Zika virus/imunologia , Infecção por Zika virus/imunologiaRESUMO
Introduction: Network and systems medicine has rapidly evolved over the past decade, thanks to computational and integrative tools, which stem in part from systems biology. However, major challenges and hurdles are still present regarding validation and translation into clinical application and decision making for precision medicine. Methods: In this context, the Collaboration on Science and Technology Action on Open Multiscale Systems Medicine (OpenMultiMed) reviewed the available advanced technologies for multidimensional data generation and integration in an open-science approach as well as key clinical applications of network and systems medicine and the main issues and opportunities for the future. Results: The development of multi-omic approaches as well as new digital tools provides a unique opportunity to explore complex biological systems and networks at different scales. Moreover, the application of findable, applicable, interoperable, and reusable principles and the adoption of standards increases data availability and sharing for multiscale integration and interpretation. These innovations have led to the first clinical applications of network and systems medicine, particularly in the field of personalized therapy and drug dosing. Enlarging network and systems medicine application would now imply to increase patient engagement and health care providers as well as to educate the novel generations of medical doctors and biomedical researchers to shift the current organ- and symptom-based medical concepts toward network- and systems-based ones for more precise diagnoses, interventions, and ideally prevention. Conclusion: In this dynamic setting, the health care system will also have to evolve, if not revolutionize, in terms of organization and management.
RESUMO
The First International Conference in Systems and Network Medicine gathered together 200 global thought leaders, scientists, clinicians, academicians, industry and government experts, medical and graduate students, postdoctoral scholars and policymakers. Held at Georgetown University Conference Center in Washington D.C. on September 11-13, 2019, the event featured a day of pre-conference lectures and hands-on bioinformatic computational workshops followed by two days of deep and diverse scientific talks, panel discussions with eminent thought leaders, and scientific poster presentations. Topics ranged from: Systems and Network Medicine in Clinical Practice; the role of -omics technologies in Health Care; the role of Education and Ethics in Clinical Practice, Systems Thinking, and Rare Diseases; and the role of Artificial Intelligence in Medicine. The conference served as a unique nexus for interdisciplinary discovery and dialogue and fostered formation of new insights and possibilities for health care systems advances.
RESUMO
The intestinal epithelial cell-surface molecule, CD98 is a type II membrane glycoprotein. Molecular orientation studies have demonstrated that the C-terminal tail of human CD98 (hCD98), which contains a PDZ-binding domain, is extracellular. In intestinal epithelial cells, CD98 is covalently linked to an amino-acid transporter with which it forms a heterodimer. This heterodimer associates with beta(1)-integrin and intercellular adhesion molecular 1 (ICAM-1) to form a macromolecular complex in the basolateral membranes of polarized intestinal epithelial cells. This review focuses on the multifunctional roles of CD98, including involvement in extracellular signaling, adhesion/polarity, and amino-acid transporter expression in intestinal epithelia. A role for CD98 in intestinal inflammation, such as Intestinal Bowel Disease (IBD), is also proposed.
Assuntos
Proteína-1 Reguladora de Fusão/química , Proteína-1 Reguladora de Fusão/metabolismo , Mucosa Intestinal/metabolismo , Sistemas de Transporte de Aminoácidos/metabolismo , Animais , Adesão Celular , Polaridade Celular , Humanos , Mucosa Intestinal/patologia , Estrutura Quaternária de ProteínaRESUMO
We analyzed the envelope proteins in pathogenic flaviviruses to determine whether there are sequence signatures associated with the tendency of viruses to produce hemorrhagic disease (H-viruses) or encephalitis (E-viruses). We found that, at the position corresponding to the glycosylated Asn-67 in dengue virus, asparagine (Asn) occurs in all seven viral species that cause hemorrhagic disease in humans. Furthermore, Asn was extremely rare at position 67 in six flaviviruses that cause encephalitis, being replaced by Asp in four of them. Of the 3,246 sequences from H- and E-viruses, we found that 2,916 sequences (90%) contained Asn in position 67 for H-viruses or Asp in position 67 for E-viruses. The change from Asn-67 that is prevalent in H-viruses to Asp-67 (common in E-viruses) contributes to a stronger electrostatically negative surface in the E-viruses as compared to the H-viruses. These findings should help predicting the disease potential of emerging and re-emerging flaviviruses and understanding the relationship between protein structure and disease outcome.
Assuntos
Flavivirus/genética , Flavivirus/patogenicidade , Proteínas do Envelope Viral/genética , Fatores de Virulência/genética , Sequência de Aminoácidos , Asparagina/genética , Ácido Aspártico/genética , Encefalite Viral/virologia , Hemorragia/virologia , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Homologia de Sequência de AminoácidosRESUMO
BACKGROUND: To-date, no claim regarding finding a consensus sequon for O-glycosylation has been made. Thus, predicting the likelihood of O-glycosylation with sequence and structural information using classical regression analysis is quite difficult. In particular, if a binary response is used to distinguish between O-glycosylated and non-O-glycosylated sequences, an appropriate set of non-O-glycosylatable sequences is hard to find. RESULTS: Three sequences from similar post-translational modifications (PTMs) of proteins occurring at, or very near, the S/T-site are analyzed: N-glycosylation, O-mucin type (O-GalNAc) glycosylation, and phosphorylation. Results found include: 1) The consensus composite sequon for O-glycosylation is: ~(W-S/T-W), where "~" denotes the "not" operator. 2) The consensus sequon for phosphorylation is ~(W-S/T/Y/H-W); although W-S/T/Y/H-W is not an absolute inhibitor of phosphorylation. 3) For linear probability model (LPM) estimation, N-glycosylated sequences are good approximations to non-O-glycosylatable sequences; although N - ~P - S/T is not an absolute inhibitor of O-glycosylation. 4) The selective positioning of an amino acid along the sequence, differentiates the PTMs of proteins. 5) Some N-glycosylated sequences are also phosphorylated at the S/T-site in the N - ~P - S/T sequon. 6) ASA values for N-glycosylated sequences are stochastically larger than those for O-GlcNAc glycosylated sequences. 7) Structural attributes (beta turn II, II´, helix, beta bridges, beta hairpin, and the phi angle) are significant LPM predictors of O-GlcNAc glycosylation. The LPM with sequence and structural data as explanatory variables yields a Kolmogorov-Smirnov (KS) statistic of 99%. 8) With only sequence data, the KS statistic erodes to 80%, and 21% of out-of-sample O-GlcNAc glycosylated sequences are mispredicted as not being glycosylated. The 95% confidence interval around this mispredictions rate is 16% to 26%. CONCLUSIONS: The data indicates the existence of a consensus sequon for O-glycosylation; and underscores the germaneness of structural information for predicting the likelihood of O-glycosylation.
Assuntos
Sequência Consenso , Modelos Moleculares , Processamento de Proteína Pós-Traducional , Proteínas/metabolismo , Sequência de Aminoácidos , Aminoácidos/química , Análise de Variância , Bases de Dados de Proteínas , Glicosilação , Humanos , Modelos Lineares , Modelos Logísticos , Fosforilação , Probabilidade , Estatísticas não ParamétricasRESUMO
Genome sequencing projects have resulted in a rapid accumulation of predicted protein sequences. With experimentally verified information on protein function lagging far behind, computational methods are used for functional annotation of proteins. Here we describe a number of protocols for protein sequence and structure analysis that can be used to infer function of uncharacterized proteins. These protocols rely on publicly available computational resources and tools and can be utilized by anyone with an Internet access.
Assuntos
Sequência de Aminoácidos , Proteínas , Homologia de Sequência de Aminoácidos , Animais , Bases de Dados de Proteínas , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Conformação Proteica , Proteínas/química , Proteínas/genética , Proteínas/metabolismo , Alinhamento de SequênciaRESUMO
Complementary developments in comparative genomics, protein structure determination and in-depth comparison of protein sequences and structures have provided a better understanding of the prevailing trends in the emergence and diversification of protein domains. The investigation of deep relationships among different classes of proteins involved in key cellular functions, such as nucleic acid polymerases and other nucleotide-dependent enzymes, indicates that a substantial set of diverse protein domains evolved within the primordial, ribozyme-dominated RNA world.
Assuntos
Evolução Molecular , Proteínas/química , Proteínas/genética , Análise de Sequência de Proteína , Filogenia , Dobramento de Proteína , Estrutura Terciária de Proteína , RNARESUMO
2',3' Cyclic nucleotide phosphodiesterases are enzymes that catalyze at least two distinct steps in the splicing of tRNA introns in eukaryotes. Recently, the biochemistry and structure of these enzymes, from yeast and the plant Arabidopsis thaliana, have been extensively studied. They were found to share a common active site, characterized by two conserved histidines, with the bacterial tRNA-ligating enzyme LigT and the vertebrate myelin-associated 2',3' phosphodiesterases. Using sensitive sequence profile analysis methods, we show that these enzymes define a large superfamily of predicted phosphoesterases with two conserved histidines (hence 2H phosphoesterase superfamily). We identify several new families of 2H phosphoesterases and present a complete evolutionary classification of this superfamily. We also carry out a structure- function analysis of these proteins and present evidence for diverse interactions for different families, within this superfamily, with RNA substrates and protein partners. In particular, we show that eukaryotes contain two ancient families of these proteins that might be involved in RNA processing, transcriptional co-activation and post-transcriptional gene silencing. Another eukaryotic family restricted to vertebrates and insects is combined with UBA and SH3 domains suggesting a role in signal transduction. We detect these phosphoesterase modules in polyproteins of certain retroviruses, rotaviruses and coronaviruses, where they could function in capping and processing of viral RNAs. Furthermore, we present evidence for multiple families of 2H phosphoesterases in bacteria, which might be involved in the processing of small molecules with the 2',3' cyclic phosphoester linkages. The evolutionary analysis suggests that the 2H domain emerged through a duplication of a simple structural unit containing a single catalytic histidine prior to the last common ancestor of all life forms. Initially, this domain appears to have been involved in RNA processing and it appears to have been recruited to perform various other functions in later stages of evolution.
Assuntos
2',3'-Nucleotídeo Cíclico Fosfodiesterases , Evolução Molecular , Histidina/química , 2',3'-Nucleotídeo Cíclico Fosfodiesterases/química , 2',3'-Nucleotídeo Cíclico Fosfodiesterases/classificação , 2',3'-Nucleotídeo Cíclico Fosfodiesterases/genética , 2',3'-Nucleotídeo Cíclico Fosfodiesterases/fisiologia , Sequência de Aminoácidos , Archaea/enzimologia , Bactérias/enzimologia , Sítios de Ligação , Domínio Catalítico , Sequência Conservada , Células Eucarióticas/enzimologia , Modelos Moleculares , Dados de Sequência Molecular , Filogenia , Estrutura Terciária de Proteína , Alinhamento de Sequência , Relação Estrutura-Atividade , Vírus/enzimologiaRESUMO
The Conserved Domain Database (CDD) is now indexed as a separate database within the Entrez system and linked to other Entrez databases such as MEDLINE(R). This allows users to search for domain types by name, for example, or to view the domain architecture of any protein in Entrez's sequence database. CDD can be accessed on the WorldWideWeb at http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cdd. Users may also employ the CD-Search service to identify conserved domains in new sequences, at http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi. CD-Search results, and pre-computed links from Entrez's protein database, are calculated using the RPS-BLAST algorithm and Position Specific Score Matrices (PSSMs) derived from CDD alignments. CD-Searches are also run by default for protein-protein queries submitted to BLAST(R) at http://www.ncbi.nlm.nih.gov/BLAST. CDD mirrors the publicly available domain alignment collections SMART and PFAM, and now also contains alignment models curated at NCBI. Structure information is used to identify the core substructure likely to be present in all family members, and to produce sequence alignments consistent with structure conservation. This alignment model allows NCBI curators to annotate 'columns' corresponding to functional sites conserved among family members.
Assuntos
Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Sequência de Aminoácidos , Animais , Sequência Conservada , Armazenamento e Recuperação da Informação , Modelos Moleculares , Alinhamento de SequênciaRESUMO
Three-dimensional structures are now known within most protein families and it is likely, when searching a sequence database, that one will identify a homolog of known structure. The goal of Entrez's 3D-structure database is to make structure information and the functional annotation it can provide easily accessible to molecular biologists. To this end, Entrez's search engine provides several powerful features: (i) links between databases, for example between a protein's sequence and structure; (ii) pre-computed sequence and structure neighbors; and (iii) structure and sequence/structure alignment visualization. Here, we focus on a new feature of Entrez's Molecular Modeling Database (MMDB): Graphical summaries of the biological annotation available for each 3D structure, based on the results of automated comparative analysis. MMDB is available at: http://www.ncbi.nlm.nih.gov/Entrez/structure.html.
Assuntos
Bases de Dados de Proteínas , Modelos Moleculares , Homologia Estrutural de Proteína , Animais , Gráficos por Computador , Imageamento Tridimensional , Estrutura Terciária de Proteína , Proteínas/químicaRESUMO
BACKGROUND: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. RESULTS: We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. CONCLUSION: The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.