Pesquisa | Portal Regional da BVS

1.

Unified and pluralistic ideals for data sharing and reuse in biodiversity.

Sterner, Beckett; Elliott, Steve; Gilbert, Edward E; Franz, Nico M.

Database (Oxford) ; 20232023 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-37465916

RESUMO

How should billions of species observations worldwide be shared and made reusable? Many biodiversity scientists assume the ideal solution is to standardize all datasets according to a single, universal classification and aggregate them into a centralized, global repository. This ideal has known practical and theoretical limitations, however, which justifies investigating alternatives. To support better community deliberation and normative evaluation, we develop a novel conceptual framework showing how different organizational models, regulative ideals and heuristic strategies are combined to form shared infrastructures supporting data reuse. The framework is anchored in a general definition of data pooling as an activity of making a taxonomically standardized body of information available for community reuse via digital infrastructure. We describe and illustrate unified and pluralistic ideals for biodiversity data pooling and show how communities may advance toward these ideals using different heuristic strategies. We present evidence for the strengths and limitations of the unification and pluralistic ideals based on systemic relationships of power, responsibility and benefit they establish among stakeholders, and we conclude the pluralistic ideal is better suited for biodiversity data.

Assuntos

Biodiversidade , Disseminação de Informação

2.

Liberating host-virus knowledge from biological dark data.

Upham, Nathan S; Poelen, Jorrit H; Paul, Deborah; Groom, Quentin J; Simmons, Nancy B; Vanhove, Maarten P M; Bertolino, Sandro; Reeder, DeeAnn M; Bastos-Silveira, Cristiane; Sen, Atriya; Sterner, Beckett; Franz, Nico M; Guidoti, Marcus; Penev, Lyubomir; Agosti, Donat.

Lancet Planet Health ; 5(10): e746-e750, 2021 10.

Artigo em Inglês | MEDLINE | ID: mdl-34562356

RESUMO

Connecting basic data about bats and other potential hosts of SARS-CoV-2 with their ecological context is crucial to the understanding of the emergence and spread of the virus. However, when lockdowns in many countries started in March, 2020, the world's bat experts were locked out of their research laboratories, which in turn impeded access to large volumes of offline ecological and taxonomic data. Pandemic lockdowns have brought to attention the long-standing problem of so-called biological dark data: data that are published, but disconnected from digital knowledge resources and thus unavailable for high-throughput analysis. Knowledge of host-to-virus ecological interactions will be biased until this challenge is addressed. In this Viewpoint, we outline two viable solutions: first, in the short term, to interconnect published data about host organisms, viruses, and other pathogens; and second, to shift the publishing framework beyond unstructured text (the so-called PDF prison) to labelled networks of digital knowledge. As the indexing system for biodiversity data, biological taxonomy is foundational to both solutions. Building digitally connected knowledge graphs of host-pathogen interactions will establish the agility needed to quickly identify reservoir hosts of novel zoonoses, allow for more robust predictions of emergence, and thereby strengthen human and planetary health systems.

Assuntos

COVID-19 , Interações entre Hospedeiro e Microrganismos , Armazenamento e Recuperação da Informação , Animais , COVID-19/epidemiologia , COVID-19/virologia , Humanos , SARS-CoV-2 , Zoonoses

3.

Bats, objectivity, and viral spillover risk.

Sterner, Beckett; Elliott, Steve; Upham, Nate; Franz, Nico.

Hist Philos Life Sci ; 43(1): 7, 2021 Jan 13.

Artigo em Inglês | MEDLINE | ID: mdl-33439354

RESUMO

What should the best practices be for modeling zoonotic disease risks, e.g. to anticipate the next pandemic, when background assumptions are unsettled or evolving rapidly? This challenge runs deeper than one might expect, all the way into how we model the robustness of contemporary phylogenetic inference and taxonomic classifications. Different and legitimate taxonomic assumptions can destabilize the putative objectivity of zoonotic risk assessments, thus potentially supporting inconsistent and overconfident policy decisions.

Assuntos

Quirópteros , Pandemias , Medição de Risco/métodos , Zoonoses , Animais , Quirópteros/virologia , Humanos , Modelos Teóricos , Pandemias/classificação , Filogenia , Zoonoses/epidemiologia , Zoonoses/transmissão , Zoonoses/virologia

4.

Wanted: Standards for FAIR taxonomic concept representations and relationships.

Sterner, Beckett; Upham, Nathan; Gupta, Prashant; Powell, Caleb; Franz, Nico M.

Biodivers Inf Sci Stand ; 52021.

Artigo em Inglês | MEDLINE | ID: mdl-35462676

RESUMO

Making the most of biodiversity data requires linking observations of biological species from multiple sources both efficiently and accurately (Bisby 2000, Franz et al. 2016). Aggregating occurrence records using taxonomic names and synonyms is computationally efficient but known to experience significant limitations on accuracy when the assumption of one-to-one relationships between names and biological entities breaks down (Remsen 2016, Franz and Sterner 2018). Taxonomic treatments and checklists provide authoritative information about the correct usage of names for species, including operational representations of the meanings of those names in the form of range maps, reference genetic sequences, or diagnostic traits. They increasingly provide taxonomic intelligence in the form of precise description of the semantic relationships between different published names in the literature. Making this authoritative information Findable, Accessible, Interoperable, and Reusable (FAIR; Wilkinson et al. 2016) would be a transformative advance for biodiversity data sharing and help drive adoption and novel extensions of existing standards such as the Taxonomic Concept Schema and the OpenBiodiv Ontology (Kennedy et al. 2006, Senderov et al. 2018). We call for the greater, global Biodiversity Information Standards (TDWG) and taxonomy community to commit to extending and expanding on how FAIR applies to biodiversity data and include practical targets and criteria for the publication and digitization of taxonomic concept representations and alignments in taxonomic treatments, checklists, and backbones.

5.

Coordinating dissent as an alternative to consensus classification: insights from systematics for bio-ontologies.

Sterner, Beckett; Witteveen, Joeri; Franz, Nico.

Hist Philos Life Sci ; 42(1): 8, 2020 Feb 06.

Artigo em Inglês | MEDLINE | ID: mdl-32030540

RESUMO

The collection and classification of data into meaningful categories is a key step in the process of knowledge making. In the life sciences, the design of data discovery and integration tools has relied on the premise that a formal classificatory system for expressing a body of data should be grounded in consensus definitions for classifications. On this approach, exemplified by the realist program of the Open Biomedical Ontologies Foundry, progress is maximized by grounding the representation and aggregation of data on settled knowledge. We argue that historical practices in systematic biology provide an important and overlooked alternative approach to classifying and disseminating data, based on a principle of coordinative rather than definitional consensus. Systematists have developed a robust system for referring to taxonomic entities that can deliver high quality data discovery and integration without invoking consensus about reality or "settled" science.

Assuntos

Consenso , Dissidências e Disputas , Ontologias Biológicas

6.

Decentralized but Globally Coordinated Biodiversity Data.

Sterner, Beckett W; Gilbert, Edward E; Franz, Nico M.

Front Big Data ; 3: 519133, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33693407

RESUMO

Centralized biodiversity data aggregation is too often failing societal needs due to pervasive and systemic data quality deficiencies. We argue for a novel approach that embodies the spirit of the Web ("small pieces loosely joined") through the decentralized coordination of data across scientific languages and communities. The upfront cost of decentralization can be offset by the long-term benefit of achieving sustained expert engagement, higher-quality data products, and ultimately more societal impact for biodiversity data. Our decentralized approach encourages the emergence and evolution of multiple self-identifying communities of practice that are regionally, taxonomically, or institutionally localized. Each community is empowered to control the social and informational design and versioning of their local data infrastructures and signals. With no single aggregator to exert centralized control over biodiversity data, decentralization generates loosely connected networks of mid-level aggregators. Global coordination is nevertheless feasible through automatable data sharing agreements that enable efficient propagation and translation of biodiversity data across communities. The decentralized model also poses novel integration challenges, among which the explicit and continuous articulation of conflicting systematic classifications and phylogenies remain the most challenging. We discuss the development of available solutions, challenges, and outline next steps: the global effort of coordination should focus on developing shared languages for data signal translation, as opposed to homogenizing the data signal itself.

7.

Controlling the error probabilities of model selection information criteria using bootstrapping.

Cullan, Michael; Lidgard, Scott; Sterner, Beckett.

J Appl Stat ; 47(13-15): 2565-2581, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-35707440

RESUMO

The Akaike Information Criterion (AIC) and related information criteria are powerful and increasingly popular tools for comparing multiple, non-nested models without the specification of a null model. However, existing procedures for information-theoretic model selection do not provide explicit and uniform control over error rates for the choice between models, a key feature of classical hypothesis testing. We show how to extend notions of Type-I and Type-II error to more than two models without requiring a null. We then present the Error Control for Information Criteria (ECIC) method, a bootstrap approach to controlling Type-I error using Difference of Goodness of Fit (DGOF) distributions. We apply ECIC to empirical and simulated data in time series and regression contexts to illustrate its value for parametric Neyman-Pearson classification. An R package implementing the bootstrap method is publicly available.

8.

The Objectivity of Organizational Functions.

Cusimano, Samuel; Sterner, Beckett.

Acta Biotheor ; 68(2): 253-269, 2020 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-31520330

RESUMO

We critique the organizational account of biological functions by showing how its basis in the closure of constraints fails to be objective. While the account treats constraints as objective features of physical systems, the number and relationship of potential constraints are subject to potentially arbitrary redescription by investigators. For example, we show that self-maintaining systems such as candle flames can realize closure on a more thorough analysis of the case, contradicting the claim that these "simple" systems lack functional organization. This also raises problems for Moreno and Mossio's associated theory of biological autonomy, which asserts that living beings are distinguished by their possession of a closed system of constraints that channel and regulate their metabolic processes.

Assuntos

Fenômenos Fisiológicos Celulares , Simulação por Computador , Modelos Biológicos , Biologia de Sistemas , Animais , Humanos , Termodinâmica

9.

To increase trust, change the social design behind aggregated biodiversity data.

Franz, Nico M; Sterner, Beckett W.

Database (Oxford) ; 20182018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-29315357

RESUMO

Growing concerns about the quality of aggregated biodiversity data are lowering trust in large-scale data networks. Aggregators frequently respond to quality concerns by recommending that biologists work with original data providers to correct errors 'at the source.' We show that this strategy falls systematically short of a full diagnosis of the underlying causes of distrust. In particular, trust in an aggregator is not just a feature of the data signal quality provided by the sources to the aggregator, but also a consequence of the social design of the aggregation process and the resulting power balance between individual data contributors and aggregators. The latter have created an accountability gap by downplaying the authorship and significance of the taxonomic hierarchies-frequently called 'backbones'-they generate, and which are in effect novel classification theories that operate at the core of data-structuring process. The Darwin Core standard for sharing occurrence records plays an under-appreciated role in maintaining the accountability gap, because this standard lacks the syntactic structure needed to preserve the taxonomic coherence of data packages submitted for aggregation, potentially leading to inferences that no individual source would support. Since high-quality data packages can mirror competing and conflicting classifications, i.e. unsettled systematic research, this plurality must be accommodated in the design of biodiversity data integration. Looking forward, a key directive is to develop new technical pathways and social incentives for experts to contribute directly to the validation of taxonomically coherent data packages as part of a greater, trustworthy aggregation process.

Assuntos

Biodiversidade , Confiabilidade dos Dados , Bases de Dados Factuais , Disseminação de Informação

10.

Moving Past the Systematics Wars.

Sterner, Beckett; Lidgard, Scott.

J Hist Biol ; 51(1): 31-67, 2018 03.

Artigo em Inglês | MEDLINE | ID: mdl-28255641

RESUMO

It is time to escape the constraints of the Systematics Wars narrative and pursue new questions that are better positioned to establish the relevance of the field in this time period to broader issues in the history of biology and history of science. To date, the underlying assumptions of the Systematics Wars narrative have led historians to prioritize theory over practice and the conflicts of a few leading theorists over the less-polarized interactions of systematists at large. We show how shifting to a practice-oriented view of methodology, centered on the trajectory of mathematization in systematics, demonstrates problems with the common view that one camp (cladistics) straightforwardly "won" over the other (phenetics). In particular, we critique David Hull's historical account in Science as a Process by demonstrating exactly the sort of intermediate level of positive sharing between phenetic and cladistic theories that undermines their mutually exclusive individuality as conceptual systems over time. It is misleading, or at least inadequate, to treat them simply as holistically opposed theories that can only interact by competition to the death. Looking to the future, we suggest that the concept of workflow provides an important new perspective on the history of mathematization and computerization in biology after World War II.

Assuntos

Biologia/história , Classificação/métodos , Biologia/métodos , História do Século XX

11.

The normative structure of mathematization in systematic biology.

Sterner, Beckett; Lidgard, Scott.

Stud Hist Philos Biol Biomed Sci ; 46: 44-54, 2014 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-24717645

RESUMO

We argue that the mathematization of science should be understood as a normative activity of advocating for a particular methodology with its own criteria for evaluating good research. As a case study, we examine the mathematization of taxonomic classification in systematic biology. We show how mathematization is a normative activity by contrasting its distinctive features in numerical taxonomy in the 1960s with an earlier reform advocated by Ernst Mayr starting in the 1940s. Both Mayr and the numerical taxonomists sought to formalize the work of classification, but Mayr introduced a qualitative formalism based on human judgment for determining the taxonomic rank of populations, while the numerical taxonomists introduced a quantitative formalism based on automated procedures for computing classifications. The key contrast between Mayr and the numerical taxonomists is how they conceptualized the temporal structure of the workflow of classification, specifically where they allowed meta-level discourse about difficulties in producing the classification.

Assuntos

Biologia/história , Classificação/métodos , Matemática/história , História do Século XX , História do Século XXI

12.

Discriminative learning for protein conformation sampling.

Zhao, Feng; Li, Shuaicheng; Sterner, Beckett W; Xu, Jinbo.

Proteins ; 73(1): 228-40, 2008 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-18412258

RESUMO

Protein structure prediction without using templates (i.e., ab initio folding) is one of the most challenging problems in structural biology. In particular, conformation sampling poses as a major bottleneck of ab initio folding. This article presents CRFSampler, an extensible protein conformation sampler, built on a probabilistic graphical model Conditional Random Fields (CRFs). Using a discriminative learning method, CRFSampler can automatically learn more than ten thousand parameters quantifying the relationship among primary sequence, secondary structure, and (pseudo) backbone angles. Using only compactness and self-avoiding constraints, CRFSampler can efficiently generate protein-like conformations from primary sequence and predicted secondary structure. CRFSampler is also very flexible in that a variety of model topologies and feature sets can be defined to model the sequence-structure relationship without worrying about parameter estimation. Our experimental results demonstrate that using a simple set of features, CRFSampler can generate decoys with much higher quality than the most recent HMM model.

Assuntos

Simulação por Computador , Modelos Moleculares , Conformação Proteica , Análise de Sequência de Proteína/métodos , Software , Algoritmos , Modelos Estatísticos , Dobramento de Proteína , Estrutura Secundária de Proteína

13.

Predicting and annotating catalytic residues: an information theoretic approach.

Sterner, Beckett; Singh, Rohit; Berger, Bonnie.

J Comput Biol ; 14(8): 1058-73, 2007 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-17887954

RESUMO

We introduce a computational method to predict and annotate the catalytic residues of a protein using only its sequence information, so that we describe both the residues' sequence locations (prediction) and their specific biochemical roles in the catalyzed reaction (annotation). While knowing the chemistry of an enzyme's catalytic residues is essential to understanding its function, the challenges of prediction and annotation have remained difficult, especially when only the enzyme's sequence and no homologous structures are available. Our sequence-based approach follows the guiding principle that catalytic residues performing the same biochemical function should have similar chemical environments; it detects specific conservation patterns near in sequence to known catalytic residues and accordingly constrains what combination of amino acids can be present near a predicted catalytic residue. We associate with each catalytic residue a short sequence profile and define a Kullback-Leibler (KL) distance measure between these profiles, which, as we show, effectively captures even subtle biochemical variations. We apply the method to the class of glycohydrolase enzymes. This class includes proteins from 96 families with very different sequences and folds, many of which perform important functions. In a cross-validation test, our approach correctly predicts the location of the enzymes' catalytic residues with a sensitivity of 80% at a specificity of 99.4%, and in a separate cross-validation we also correctly annotate the biochemical role of 80% of the catalytic residues. Our results compare favorably to existing methods. Moreover, our method is more broadly applicable because it relies on sequence and not structure information; it may, furthermore, be used in conjunction with structure-based methods.

Assuntos

Domínio Catalítico/genética , Enzimas/química , Enzimas/genética , Algoritmos , Biologia Computacional , Bases de Dados de Proteínas , Glicosídeo Hidrolases/química , Glicosídeo Hidrolases/genética , Humanos , Teoria da Informação , Modelos Moleculares , Conformação Proteica , Proteômica/estatística & dados numéricos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA