Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
Nucleic Acids Res ; 50(W1): W108-W114, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35524558

RESUMO

Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various subsystems and scales remain siloed by different software tools. For example, each tool must be executed through a distinct interface. To help investigators find and use simulation tools, we developed BioSimulators (https://biosimulators.org), a central registry of the capabilities of simulation tools and consistent Python, command-line and containerized interfaces to each version of each tool. The foundation of BioSimulators is standards, such as CellML, SBML, SED-ML and the COMBINE archive format, and validation tools for simulation projects and simulation tools that ensure these standards are used consistently. To help modelers find tools for particular projects, we have also used the registry to develop recommendation services. We anticipate that BioSimulators will help modelers exchange, reproduce, and combine simulations.


Assuntos
Simulação por Computador , Software , Humanos , Bioengenharia , Modelos Biológicos , Sistema de Registros , Pesquisadores
2.
PLoS Biol ; 15(6): e2001414, 2017 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-28662064

RESUMO

In many disciplines, data are highly decentralized across thousands of online databases (repositories, registries, and knowledgebases). Wringing value from such databases depends on the discipline of data science and on the humble bricks and mortar that make integration possible; identifiers are a core component of this integration infrastructure. Drawing on our experience and on work by other groups, we outline 10 lessons we have learned about the identifier qualities and best practices that facilitate large-scale data integration. Specifically, we propose actions that identifier practitioners (database providers) should take in the design, provision and reuse of identifiers. We also outline the important considerations for those referencing identifiers in various circumstances, including by authors and data generators. While the importance and relevance of each lesson will vary by context, there is a need for increased awareness about how to avoid and manage common identifier problems, especially those related to persistence and web-accessibility/resolvability. We focus strongly on web-based identifiers in the life sciences; however, the principles are broadly relevant to other disciplines.


Assuntos
Disciplinas das Ciências Biológicas/métodos , Biologia Computacional/métodos , Mineração de Dados/métodos , Design de Software , Software , Disciplinas das Ciências Biológicas/estatística & dados numéricos , Disciplinas das Ciências Biológicas/tendências , Biologia Computacional/tendências , Mineração de Dados/estatística & dados numéricos , Mineração de Dados/tendências , Bases de Dados Factuais/estatística & dados numéricos , Bases de Dados Factuais/tendências , Previsões , Humanos , Internet
3.
Protein Sci ; 14(1): 13-23, 2005 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-15608116

RESUMO

We identified key residues from the structural alignment of families of protein domains from SCOP which we represented in the form of sparse protein signatures. A signature-generating algorithm (SigGen) was developed and used to automatically identify key residues based on several structural and sequence-based criteria. The capacity of the signatures to detect related sequences from the SWISSPROT database was assessed by receiver operator characteristic (ROC) analysis and jack-knife testing. Test signatures for families from each of the main SCOP classes are described in relation to the quality of the structural alignments, the SigGen parameters used, and their diagnostic performance. We show that automatically generated signatures are potently diagnostic for their family (ROC50 scores typically >0.8), consistently outperform random signatures, and can identify sequence relationships in the "twilight zone" of protein sequence similarity (<40%). Signatures based on 15%-30% of alignment positions occurred most frequently among the best-performing signatures. When alignment quality is poor, sparser signatures perform better, whereas signatures generated from higher-quality alignments of fewer structures require more positions to be diagnostic. Our validation of signatures from the Globin family shows that when sequences from the structural alignment are removed and new signatures generated, the omitted sequences are still detected. The positions highlighted by the signature often correspond (alignment specificity >0.7) to the key positions in the original (non-jack-knifed) alignment. We discuss potential applications of sparse signatures in sequence annotation and homology modeling.


Assuntos
Estrutura Terciária de Proteína , Proteínas/química , Proteínas/classificação , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Estudos de Avaliação como Assunto , Globinas/química , Globinas/classificação , Dados de Sequência Molecular , Alinhamento de Sequência/métodos
4.
Proteins ; 61(4): 1075-88, 2005 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-16247798

RESUMO

Considering the limited success of the most sophisticated docking methods available and the amount of computation required for systematic docking, cataloging all the known interfaces may be an alternative basis for the prediction of protein tertiary and quaternary structures. We classify domain interfaces according to the geometry of domain-domain association. By applying a simple and efficient method called "interface tag clustering," more than 4,000 distinct types of domain interfaces are collected from Protein Quaternary Structure Server and Protein Data Bank. Given a pair of interacting domains, we define "face" as the set of interacting residues in each single domain and the pair of interacting faces as an "interface." We investigate how the geometry of interfaces relates to a network of interacting protein families, such as how many different binding orientations are possible between two families or whether a family uses distinct surfaces or the same surface when the family has diverse interaction partners from various families. We show there are, on average, 1.2-1.9 different types of interfaces between interacting domains and a significant number of family pairs associate in multiple orientations. In general, a family tends to use distinct faces for each partner when the family has diverse interaction partners. Each face is highly specific to its interaction partner and the binding orientation. The relative positions of interface residues are generally well conserved within the same type of interface even between remote homologs. The classification result is available at http://www.biotec.tu-dresden.de/~wkim/supplement.


Assuntos
Proteínas/química , Sequência de Aminoácidos , Sítios de Ligação , Cristalografia por Raios X , Cadeias de Markov , Modelos Teóricos , Conformação Proteica , Alinhamento de Sequência
5.
Appl Bioinformatics ; 4(2): 131-5, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16128614

RESUMO

UNLABELLED: Receiver operating characteristic (ROC) analysis is a powerful and widely used technique for assessing predictive methods, yet there are no generic, open-source software tools for this that are freely available. Our ROCPLOT program performs ROC analysis on one or more files of search results (hits) and generates the following: (i) ROC values, giving a convenient numerical measure of method sensitivity and specificity; (ii) ROC plots graphically displaying sensitivity and specificity; (iii) classification plots to aid interpretation of the ROC plots and values; and (iv) a bar chart of the distribution of ROC values. ROCPLOT is generic and flexible: data in multiple hits files can be processed in series or parallel, allowing the results of multiple predictions to be viewed side-by-side or combined. AVAILABILITY: ROCPLOT is freely available for download as part of the European Molecular Biology Open Software Suite, EMBOSS (http://emboss.sourceforge.net/apps/rocplot.html).


Assuntos
Gráficos por Computador , Interpretação Estatística de Dados , Modelos Biológicos , Curva ROC , Software , Interface Usuário-Computador , Modelos Estatísticos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA