RESUMO
MOTIVATION: Tools for pairwise alignments between 3D structures of proteins are of fundamental importance for structural biology and bioinformatics, enabling visual exploration of evolutionary and functional relationships. However, the absence of a user-friendly, browser-based tool for creating alignments and visualizing them at both 1D sequence and 3D structural levels makes this process unnecessarily cumbersome. RESULTS: We introduce a novel pairwise structure alignment tool (rcsb.org/alignment) that seamlessly integrates into the RCSB Protein Data Bank (RCSB PDB) research-focused RCSB.org web portal. Our tool and its underlying application programming interface (alignment.rcsb.org) empowers users to align several protein chains with a reference structure by providing access to established alignment algorithms (FATCAT, CE, TM-align, or Smith-Waterman 3D). The user-friendly interface simplifies parameter setup and input selection. Within seconds, our tool enables visualization of results in both sequence (1D) and structural (3D) perspectives through the RCSB PDB RCSB.org Sequence Annotations viewer and Mol* 3D viewer, respectively. Users can effortlessly compare structures deposited in the PDB archive alongside more than a million incorporated Computed Structure Models coming from the ModelArchive and AlphaFold DB. Moreover, this tool can be used to align custom structure data by providing a link/URL or uploading atomic coordinate files directly. Importantly, alignment results can be bookmarked and shared with collaborators. By bridging the gap between 1D sequence and 3D structures of proteins, our tool facilitates deeper understanding of complex evolutionary relationships among proteins through comprehensive sequence and structural analyses. AVAILABILITY AND IMPLEMENTATION: The alignment tool is part of the RCSB PDB research-focused RCSB.org web portal and available at rcsb.org/alignment. Programmatic access is available via alignment.rcsb.org. Frontend code has been published at github.com/rcsb/rcsb-pecos-app. Visualization is powered by the open-source Mol* viewer (github.com/molstar/molstar and github.com/molstar/rcsb-molstar) plus the Sequence Annotations in 3D Viewer (github.com/rcsb/rcsb-saguaro-3d).
Assuntos
Algoritmos , Bases de Dados de Proteínas , Proteínas , Alinhamento de Sequência , Software , Proteínas/química , Alinhamento de Sequência/métodos , Conformação Proteica , Interface Usuário-Computador , Biologia Computacional/métodosRESUMO
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), founding member of the Worldwide Protein Data Bank (wwPDB), is the US data center for the open-access PDB archive. As wwPDB-designated Archive Keeper, RCSB PDB is also responsible for PDB data security. Annually, RCSB PDB serves >10 000 depositors of three-dimensional (3D) biostructures working on all permanently inhabited continents. RCSB PDB delivers data from its research-focused RCSB.org web portal to many millions of PDB data consumers based in virtually every United Nations-recognized country, territory, etc. This Database Issue contribution describes upgrades to the research-focused RCSB.org web portal that created a one-stop-shop for open access to â¼200 000 experimentally-determined PDB structures of biological macromolecules alongside >1 000 000 incorporated Computed Structure Models (CSMs) predicted using artificial intelligence/machine learning methods. RCSB.org is a 'living data resource.' Every PDB structure and CSM is integrated weekly with related functional annotations from external biodata resources, providing up-to-date information for the entire corpus of 3D biostructure data freely available from RCSB.org with no usage limitations. Within RCSB.org, PDB structures and the CSMs are clearly identified as to their provenance and reliability. Both are fully searchable, and can be analyzed and visualized using the full complement of RCSB.org web portal capabilities.
Assuntos
Inteligência Artificial , Bases de Dados de Proteínas , Proteínas , Aprendizado de Máquina , Conformação Proteica , Proteínas/química , Reprodutibilidade dos TestesRESUMO
MOTIVATION: Mapping positional features from one-dimensional (1D) sequences onto three-dimensional (3D) structures of biological macromolecules is a powerful tool to show geometric patterns of biochemical annotations and provide a better understanding of the mechanisms underpinning protein and nucleic acid function at the atomic level. RESULTS: We present a new library designed to display fully customizable interactive views between 1D positional features of protein and/or nucleic acid sequences and their 3D structures as isolated chains or components of macromolecular assemblies. AVAILABILITY AND IMPLEMENTATION: https://github.com/rcsb/rcsb-saguaro-3d. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Ácidos Nucleicos , Software , Bases de Dados de Proteínas , Substâncias Macromoleculares/química , Proteínas/químicaRESUMO
MOTIVATION: Membrane proteins are encoded by approximately one fifth of human genes but account for more than half of all US FDA approved drug targets. Thanks to new technological advances, the number of membrane proteins archived in the PDB is growing rapidly. However, automatic identification of membrane proteins or inference of membrane location is not a trivial task. RESULTS: We present recent improvements to the RCSB Protein Data Bank web portal (RCSB PDB, rcsb.org) that provide a wealth of new membrane protein annotations integrated from four external resources: OPM, PDBTM, MemProtMD and mpstruc. We have substantially enhanced the presentation of data on membrane proteins. The number of membrane proteins with annotations available on rcsb.org was increased by â¼80%. Users can search for these annotations, explore corresponding tree hierarchies, display membrane segments at the 1D amino acid sequence level, and visualize the predicted location of the membrane layer in 3D. AVAILABILITY AND IMPLEMENTATION: Annotations, search, tree data and visualization are available at our rcsb.org web portal. Membrane visualization is supported by the open-source Mol* viewer (molstar.org and github.com/molstar/molstar). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Proteínas de Membrana , Software , Humanos , Conformação Proteica , Bases de Dados de Proteínas , Sequência de AminoácidosRESUMO
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB), the US data center for the global PDB archive and a founding member of the Worldwide Protein Data Bank partnership, serves tens of thousands of data depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without restrictions to millions of RCSB.org users around the world, including >660 000 educators, students and members of the curious public using PDB101.RCSB.org. PDB data depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy, 3D electron microscopy and micro-electron diffraction. PDB data consumers accessing our web portals include researchers, educators and students studying fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. During the past 2 years, the research-focused RCSB PDB web portal (RCSB.org) has undergone a complete redesign, enabling improved searching with full Boolean operator logic and more facile access to PDB data integrated with >40 external biodata resources. New features and resources are described in detail using examples that showcase recently released structures of SARS-CoV-2 proteins and host cell proteins relevant to understanding and addressing the COVID-19 global pandemic.
Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Substâncias Macromoleculares/química , Conformação Proteica , Proteínas/química , Bioengenharia/métodos , Pesquisa Biomédica/métodos , Biotecnologia/métodos , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Humanos , Substâncias Macromoleculares/metabolismo , Pandemias , Proteínas/genética , Proteínas/metabolismo , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Software , Proteínas Virais/química , Proteínas Virais/genética , Proteínas Virais/metabolismoRESUMO
MOTIVATION: Interoperability between polymer sequences and structural data is essential for providing a complete picture of protein and gene features and helping to understand biomolecular function. RESULTS: Herein, we present two resources designed to improve interoperability between the RCSB Protein Data Bank, the NCBI and the UniProtKB data resources and visualize integrated data therefrom. The underlying tools provide a flexible means of mapping between the different coordinate spaces and an interactive tool allows convenient visualization of the 1-dimensional data over the web. AVAILABILITYAND IMPLEMENTATION: https://1d-coordinates.rcsb.org and https://rcsb.github.io/rcsb-saguaro. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMO
Adsorption stand out among other standard techniques used for water treatment because of its remarkable simplicity, easy operation, and high removal capability. Expanded graphite has been selected as a promising agent for oil spill adsorption, but its production involves the generation of corrosive remnants and massive amounts of contaminated washing waters. Although the advantageous use of the H2O2-H2SO4 mixture was described in 1978, reported works using this method are scarce. This work deals with the urgent necessity for the development of alternative chemical routes decreasing their environmental impact (based on green chemistry concepts), presenting a process for expanded graphite production using only two intercalation chemicals, reducing the consumption of sulfuric acid to only 10% and avoiding the use of strong oxidant salts (both environmentally detrimental). Three process parameters were evaluated: milling effect, peroxide concentration, and microwave expansion. Some remarkable results were obtained following this route: high specific volumes elevated oil adsorption rate exhibiting a high oil-water selectivity and rapid adsorption. Furthermore, the recycling capability was checked using up to six adsorption cycles. Results showed that milling time reduces the specimen's expansion rate and oil adsorption capacity due to poor intercalant insertion and generation of small particle sizes.
Assuntos
Grafite , Poluição por Petróleo , Poluentes Químicos da Água , Peróxido de Hidrogênio , Poluentes Químicos da Água/análise , AdsorçãoRESUMO
Detection of protein structure similarity is a central challenge in structural bioinformatics. Comparisons are usually performed at the polypeptide chain level, however the functional form of a protein within the cell is often an oligomer. This fact, together with recent growth of oligomeric structures in the Protein Data Bank (PDB), demands more efficient approaches to oligomeric assembly alignment/retrieval. Traditional methods use atom level information, which can be complicated by the presence of topological permutations within a polypeptide chain and/or subunit rearrangements. These challenges can be overcome by comparing electron density volumes directly. But, brute force alignment of 3D data is a compute intensive search problem. We developed a 3D Zernike moment normalization procedure to orient electron density volumes and assess similarity with unprecedented speed. Similarity searching with this approach enables real-time retrieval of proteins/protein assemblies resembling a target, from PDB or user input, together with resulting alignments (http://shape.rcsb.org).
Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas/química , Algoritmos , Internet , Modelos Moleculares , Modelos Estatísticos , Distribuição Normal , Peptídeos/química , Conformação Proteica , SoftwareRESUMO
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, rcsb.org), the US data center for the global PDB archive, serves thousands of Data Depositors in the Americas and Oceania and makes 3D macromolecular structure data available at no charge and without usage restrictions to more than 1 million rcsb.org Users worldwide and 600 000 pdb101.rcsb.org education-focused Users around the globe. PDB Data Depositors include structural biologists using macromolecular crystallography, nuclear magnetic resonance spectroscopy and 3D electron microscopy. PDB Data Consumers include researchers, educators and students studying Fundamental Biology, Biomedicine, Biotechnology and Energy. Recent reorganization of RCSB PDB activities into four integrated, interdependent services is described in detail, together with tools and resources added over the past 2 years to RCSB PDB web portals in support of a 'Structural View of Biology.'
Assuntos
Bases de Dados de Proteínas , Conformação Proteica , Pesquisa Biomédica/educação , Biotecnologia/educação , Curadoria de Dados , SoftwareRESUMO
BioJava is an open-source project that provides a Java library for processing biological data. The project aims to simplify bioinformatic analyses by implementing parsers, data structures, and algorithms for common tasks in genomics, structural biology, ontologies, phylogenetics, and more. Since 2012, we have released two major versions of the library (4 and 5) that include many new features to tackle challenges with increasingly complex macromolecular structure data. BioJava requires Java 8 or higher and is freely available under the LGPL 2.1 license. The project is hosted on GitHub at https://github.com/biojava/biojava. More information and documentation can be found online on the BioJava website (http://www.biojava.org) and tutorial (https://github.com/biojava/biojava-tutorial). All inquiries should be directed to the GitHub page or the BioJava mailing list (http://lists.open-bio.org/mailman/listinfo/biojava-l).
Assuntos
Biologia Computacional/métodos , Acesso à Informação , Algoritmos , Biblioteca Gênica , Genoma/genética , Genômica , Armazenamento e Recuperação da Informação , Internet , SoftwareRESUMO
The aim of this study was to characterize the protein profile of ovarian follicular fluid (FF) of brown brocket deer (Mazama gouazoubira). Five adult females received an ovarian stimulation treatment and the FF was collected by laparoscopy from small/medium (≤3.5 mm) and large (>3.5 mm) follicles. Concentrations of soluble proteins in FF samples were measured and proteins were analyzed by 1-D SDS-PAGE followed by tryptic digestion and tandem mass spectrometry. Data from protein list defined after a Mascot database search were analyzed using the STRAP software tool. For the protein concentration, no significant difference (P > 0.05) was observed between small/medium and large follicles: 49.2 ± 22.8 and 56.7 ± 27.4 µg/µl, respectively. Mass spectrometry analysis identified 13 major proteins, but with no significant difference (P > 0.05) between follicle size class. This study provides insight into elucidating folliculogenesis in brown brocket deer.
Assuntos
Cervos , Animais , Feminino , Líquido Folicular , Folículo Ovariano , Indução da OvulaçãoRESUMO
We present the assembly category assessment in the 13th edition of the CASP community-wide experiment. For the second time, protein assemblies constitute an independent assessment category. Compared to the last edition we see a clear uptake in participation, more oligomeric targets released, and consistent, albeit modest, improvement of the predictions quality. Looking at the tertiary structure predictions, we observe that ignoring the oligomeric state of the targets hinders modeling success. We also note that some contact prediction groups successfully predicted homomeric interfacial contacts, though it appears that these predictions were not used for assembly modeling. Homology modeling with sizeable human intervention appears to form the basis of the assembly prediction techniques in this round of CASP. Future developments should see more integrated approaches where subunits are modeled in the context of the assemblies they form.
Assuntos
Biologia Computacional , Conformação Proteica , Proteínas/ultraestrutura , Software , Algoritmos , Humanos , Simulação de Dinâmica Molecular , Proteínas/química , Proteínas/genética , Análise de Sequência de ProteínaRESUMO
Small angle X-ray scattering (SAXS) measures comprehensive distance information on a protein's structure, which can constrain and guide computational structure prediction algorithms. Here, we evaluate structure predictions of 11 monomeric and oligomeric proteins for which SAXS data were collected and provided to predictors in the 13th round of the Critical Assessment of protein Structure Prediction (CASP13). The category for SAXS-assisted predictions made gains in certain areas for CASP13 compared to CASP12. Improvements included higher quality data with size exclusion chromatography-SAXS (SEC-SAXS) and better selection of targets and communication of results by CASP organizers. In several cases, we can track improvements in model accuracy with use of SAXS data. For hard multimeric targets where regular folding algorithms were unsuccessful, SAXS data helped predictors to build models better resembling the global shape of the target. For most models, however, no significant improvement in model accuracy at the domain level was registered from use of SAXS data, when rigorously comparing SAXS-assisted models to the best regular server predictions. To promote future progress in this category, we identify successes, challenges, and opportunities for improved strategies in prediction, assessment, and communication of SAXS data to predictors. An important observation is that, for many targets, SAXS data were inconsistent with crystal structures, suggesting that these proteins adopt different conformation(s) in solution. This CASP13 result, if representative of PDB structures and future CASP targets, may have substantive implications for the structure training databases used for machine learning, CASP, and use of prediction models for biology.
Assuntos
Biologia Computacional , Conformação Proteica , Proteínas/ultraestrutura , Algoritmos , Modelos Moleculares , Dobramento de Proteína , Proteínas/química , Proteínas/genética , Espalhamento a Baixo Ângulo , Soluções/química , Difração de Raios XRESUMO
Motivation: The interactive visualization of very large macromolecular complexes on the web is becoming a challenging problem as experimental techniques advance at an unprecedented rate and deliver structures of increasing size. Results: We have tackled this problem by developing highly memory-efficient and scalable extensions for the NGL WebGL-based molecular viewer and by using Macromolecular Transmission Format (MMTF), a binary and compressed MMTF. These enable NGL to download and render molecular complexes with millions of atoms interactively on desktop computers and smartphones alike, making it a tool of choice for web-based molecular visualization in research and education. Availability and implementation: The source code is freely available under the MIT license at github.com/arose/ngl and distributed on NPM (npmjs.com/package/ngl). MMTF-JavaScript encoders and decoders are available at github.com/rcsb/mmtf-javascript.
Assuntos
Gráficos por Computador , Internet , Substâncias Macromoleculares , SoftwareRESUMO
Phenotypic variation is the raw material of adaptive Darwinian evolution. The phenotypic variation found in organismal development is biased towards certain phenotypes, but the molecular mechanisms behind such biases are still poorly understood. Gene regulatory networks have been proposed as one cause of constrained phenotypic variation. However, most pertinent evidence is theoretical rather than experimental. Here, we study evolutionary biases in two synthetic gene regulatory circuits expressed in Escherichia coli that produce a gene expression stripe-a pivotal pattern in embryonic development. The two parental circuits produce the same phenotype, but create it through different regulatory mechanisms. We show that mutations cause distinct novel phenotypes in the two networks and use a combination of experimental measurements, mathematical modelling and DNA sequencing to understand why mutations bring forth only some but not other novel gene expression phenotypes. Our results reveal that the regulatory mechanisms of networks restrict the possible phenotypic variation upon mutation. Consequently, seemingly equivalent networks can indeed be distinct in how they constrain the outcome of further evolution.
Assuntos
Evolução Biológica , Escherichia coli/genética , Redes Reguladoras de Genes , Modelos Genéticos , Fenótipo , Biologia Sintética/métodos , Arabinose/metabolismo , Arabinose/farmacologia , Clonagem Molecular , Meios de Cultura/química , Meios de Cultura/farmacologia , Escherichia coli/efeitos dos fármacos , Escherichia coli/metabolismo , Regulação da Expressão Gênica , Variação Genética , Genótipo , Mutação , Seleção GenéticaRESUMO
A correct assessment of the quaternary structure of proteins is a fundamental prerequisite to understanding their function, physico-chemical properties and mode of interaction with other proteins. Currently about 90% of structures in the Protein Data Bank are crystal structures, in which the correct quaternary structure is embedded in the crystal lattice among a number of crystal contacts. Computational methods are required to 1) classify all protein-protein contacts in crystal lattices as biologically relevant or crystal contacts and 2) provide an assessment of how the biologically relevant interfaces combine into a biological assembly. In our previous work we addressed the first problem with our EPPIC (Evolutionary Protein Protein Interface Classifier) method. Here, we present our solution to the second problem with a new method that combines the interface classification results with symmetry and topology considerations. The new algorithm enumerates all possible valid assemblies within the crystal using a graph representation of the lattice and predicts the most probable biological unit based on the pairwise interface scoring. Our method achieves 85% precision (ranging from 76% to 90% for different oligomeric types) on a new dataset of 1,481 biological assemblies with consensus of PDB annotations. Although almost the same precision is achieved by PISA, currently the most popular quaternary structure assignment method, we show that, due to the fundamentally different approach to the problem, the two methods are complementary and could be combined to improve biological assembly assignments. The software for the automatic assessment of protein assemblies (EPPIC version 3) has been made available through a web server at http://www.eppic-web.org.
Assuntos
Estrutura Quaternária de Proteína , Proteínas/química , Algoritmos , Biologia Computacional , Cristalografia por Raios X/estatística & dados numéricos , Bases de Dados de Proteínas/estatística & dados numéricos , Modelos Moleculares , Domínios e Motivos de Interação entre Proteínas , SoftwareRESUMO
The Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB, http://rcsb.org), the US data center for the global PDB archive, makes PDB data freely available to all users, from structural biologists to computational biologists and beyond. New tools and resources have been added to the RCSB PDB web portal in support of a 'Structural View of Biology.' Recent developments have improved the User experience, including the high-speed NGL Viewer that provides 3D molecular visualization in any web browser, improved support for data file download and enhanced organization of website pages for query, reporting and individual structure exploration. Structure validation information is now visible for all archival entries. PDB data have been integrated with external biological resources, including chromosomal position within the human genome; protein modifications; and metabolic pathways. PDB-101 educational materials have been reorganized into a searchable website and expanded to include new features such as the Geis Digital Archive.
Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Proteínas/química , Proteínas/genética , Conjuntos de Dados como Assunto , Redes e Vias Metabólicas , Modelos Moleculares , Conformação Proteica , Proteínas/metabolismo , Software , Relação Estrutura-Atividade , Interface Usuário-Computador , NavegadorRESUMO
We present the results of the first independent assessment of protein assemblies in CASP. A total of 1624 oligomeric models were submitted by 108 predictor groups for the 30 oligomeric targets in the CASP12 edition. We evaluated the accuracy of oligomeric predictions by comparison to their reference structures at the interface patch and residue contact levels. We find that interface patches are more reliably predicted than the specific residue contacts. Whereas none of the 15 hard oligomeric targets have successful predictions for the residue contacts at the interface, six have models with resemblance in the interface patch. Successful predictions of interface patch and contacts exist for all targets suitable for homology modeling, with at least one group improving over the best available template for each target. However, the participation in protein assembly prediction is low and uneven. Three human groups are closely ranked at the top by overall performance, but a server outperforms all other predictors for targets suitable for homology modeling. The state of the art of protein assembly prediction methods is in development and has apparent room for improvement, especially for assemblies without templates.
Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Modelos Moleculares , Simulação de Dinâmica Molecular , Conformação Proteica , Proteínas/química , Algoritmos , Humanos , Dobramento de Proteína , Análise de Sequência de ProteínaRESUMO
Mazama gouazoubira is a small deer species widely distributed in South America. Previous studies have shown that this species presents intraspecific chromosomal polymorphisms, which could affect fertility due to the effects of chromosomal rearrangements on gamete formation. Important aspects regarding the karyotype evolution of this species and the genus remain undefined due to the lack of information concerning the causes of this chromosomal variation. Nineteen individuals belonging to the Mazama gouazoubira population located in the Pantanal were cytogenetically evaluated. Among the individuals analyzed, 9 had B chromosomes and 5 carried a heterozygous centric fusion (2n = 69 and FN = 70). In 3 individuals, the fusion occurred between chromosomes X and 16, in 1 individual between chromosomes 7 and 21, and in another individual between chromosomes 4 and 16. These striking polymorphisms could be explained by several hypotheses. One is that the chromosome rearrangements in this species are recent and not fixed in the population yet, and another hypothesis is that they represent a balanced polymorphism and that heterozygotes have an adaptive advantage. On the other hand, these polymorphisms may negatively influence fertility and raise questions about sustainability or reproductive isolation of the population.
Assuntos
Cervos/genética , Polimorfismo Genético , Animais , Cromossomos de Mamíferos , Feminino , Cariótipo , MasculinoRESUMO
Measures of traits are the basis of functional biological diversity. Numerous works consider mean species-level measures of traits while ignoring individual variance within species. However, there is a large amount of variation within species and it is increasingly apparent that it is important to consider trait variation not only between species, but also within species. Mammals are an interesting group for investigating trait-based approaches because they play diverse and important ecological functions (e.g., pollination, seed dispersal, predation, grazing) that are correlated with functional traits. Here we compile a data set comprising morphological and life history information of 279 mammal species from 39,850 individuals of 388 populations ranging from -5.83 to -29.75 decimal degrees of latitude and -34.82 to -56.73 decimal degrees of longitude in the Atlantic forest of South America. We present trait information from 16,840 individuals of 181 species of non-volant mammals (Rodentia, Didelphimorphia, Carnivora, Primates, Cingulata, Artiodactyla, Pilosa, Lagomorpha, Perissodactyla) and from 23,010 individuals of 98 species of volant mammals (Chiroptera). The traits reported include body mass, age, sex, reproductive stage, as well as the geographic coordinates of sampling for all taxa. Moreover, we gathered information on forearm length for bats and body length and tail length for rodents and marsupials. No copyright restrictions are associated with the use of this data set. Please cite this data paper when the data are used in publications. We also request that researchers and teachers inform us of how they are using the data.