Búsqueda | BVS Bolivia

1.

Standardised Versioning of Datasets: a FAIR-compliant Proposal.

González-Cebrián, Alba; Bradford, Michael; Chis, Adriana E; González-Vélez, Horacio.

Sci Data ; 11(1): 358, 2024 Apr 09.

Artículo en Inglés | MEDLINE | ID: mdl-38594314

RESUMEN

This paper presents a standardised dataset versioning framework for improved reusability, recognition and data version tracking, facilitating comparisons and informed decision-making for data usability and workflow integration. The framework adopts a software engineering-like data versioning nomenclature ("major.minor.patch") and incorporates data schema principles to promote reproducibility and collaboration. To quantify changes in statistical properties over time, the concept of data drift metrics (d) is introduced. Three metrics (dP, dE,PCA, and dE,AE) based on unsupervised Machine Learning techniques (Principal Component Analysis and Autoencoders) are evaluated for dataset creation, update, and deletion. The optimal choice is the dE,PCA metric, combining PCA models with splines. It exhibits efficient computational time, with values below 50 for new dataset batches and values consistent with seasonal or trend variations. Major updates (i.e., values of 100) occur when scaling transformations are applied to over 30% of variables while efficiently handling information loss, yielding values close to 0. This metric achieved a favourable trade-off between interpretability, robustness against information loss, and computation time.

Asunto(s)

Conjuntos de Datos como Asunto , Programas Informáticos , Análisis de Componente Principal , Reproducibilidad de los Resultados , Flujo de Trabajo , Conjuntos de Datos como Asunto/normas , Aprendizaje Automático

2.

Developing a standardized but extendable framework to increase the findability of infectious disease datasets.

Tsueng, Ginger; Cano, Marco A Alvarado; Bento, José; Czech, Candice; Kang, Mengjia; Pache, Lars; Rasmussen, Luke V; Savidge, Tor C; Starren, Justin; Wu, Qinglong; Xin, Jiwen; Yeaman, Michael R; Zhou, Xinghua; Su, Andrew I; Wu, Chunlei; Brown, Liliana; Shabman, Reed S; Hughes, Laura D.

Sci Data ; 10(1): 99, 2023 02 23.

Artículo en Inglés | MEDLINE | ID: mdl-36823157

RESUMEN

Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.

Asunto(s)

Enfermedades Transmisibles , Conjuntos de Datos como Asunto , Metadatos , Reproducibilidad de los Resultados , Conjuntos de Datos como Asunto/normas , Humanos

3.

CAS(ME)³: A Third Generation Facial Spontaneous Micro-Expression Database With Depth Information and High Ecological Validity.

Li, Jingting; Dong, Zizhao; Lu, Shaoyuan; Wang, Su-Jing; Yan, Wen-Jing; Ma, Yinhuan; Liu, Ye; Huang, Changbing; Fu, Xiaolan.

IEEE Trans Pattern Anal Mach Intell ; 45(3): 2782-2800, 2023 03.

Artículo en Inglés | MEDLINE | ID: mdl-35560102

RESUMEN

Micro-expression (ME) is a significant non-verbal communication clue that reveals one person's genuine emotional state. The development of micro-expression analysis (MEA) has just gained attention in the last decade. However, the small sample size problem constrains the use of deep learning on MEA. Besides, ME samples distribute in six different databases, leading to database bias. Moreover, the ME database development is complicated. In this article, we introduce a large-scale spontaneous ME database: CAS(ME) 3. The contribution of this article is summarized as follows: (1) CAS(ME) 3 offers around 80 hours of videos with over 8,000,000 frames, including manually labeled 1,109 MEs and 3,490 macro-expressions. Such a large sample size allows effective MEA method validation while avoiding database bias. (2) Inspired by psychological experiments, CAS(ME) 3 provides the depth information as an additional modality unprecedentedly, contributing to multi-modal MEA. (3) For the first time, CAS(ME) 3 elicits ME with high ecological validity using the mock crime paradigm, along with physiological and voice signals, contributing to practical MEA. (4) Besides, CAS(ME) 3 provides 1,508 unlabeled videos with more than 4,000,000 frames, i.e., a data platform for unsupervised MEA methods. (5) Finally, we demonstrate the effectiveness of depth information by the proposed depth flow algorithm and RGB-D information.

Asunto(s)

Bases de Datos Factuales , Emociones , Expresión Facial , Femenino , Humanos , Masculino , Adulto Joven , Algoritmos , Sesgo , Bases de Datos Factuales/normas , Conjuntos de Datos como Asunto/normas , Estimulación Luminosa , Reproducibilidad de los Resultados , Tamaño de la Muestra , Aprendizaje Automático Supervisado/normas , Grabación en Video , Percepción Visual

4.

Polyp segmentation with consistency training and continuous update of pseudo-label.

Park, Hyun-Cheol; Poudel, Sahadev; Ghimire, Raman; Lee, Sang-Woong.

Sci Rep ; 12(1): 14626, 2022 08 26.

Artículo en Inglés | MEDLINE | ID: mdl-36028547

RESUMEN

Polyp segmentation has accomplished massive triumph over the years in the field of supervised learning. However, obtaining a vast number of labeled datasets is commonly challenging in the medical domain. To solve this problem, we employ semi-supervised methods and suitably take advantage of unlabeled data to improve the performance of polyp image segmentation. First, we propose an encoder-decoder-based method well suited for the polyp with varying shape, size, and scales. Second, we utilize the teacher-student concept of training the model, where the teacher model is the student model's exponential average. Third, to leverage the unlabeled dataset, we enforce a consistency technique and force the teacher model to generate a similar output on the different perturbed versions of the given input. Finally, we propose a method that upgrades the traditional pseudo-label method by learning the model with continuous update of pseudo-label. We show the efficacy of our proposed method on different polyp datasets, and hence attaining better results in semi-supervised settings. Extensive experiments demonstrate that our proposed method can propagate the unlabeled dataset's essential information to improve performance.

Asunto(s)

Pólipos/patología , Aprendizaje Automático Supervisado , Conjuntos de Datos como Asunto/normas , Conjuntos de Datos como Asunto/tendencias , Humanos , Procesamiento de Imagen Asistido por Computador , Pólipos/diagnóstico por imagen

5.

An Efficient Semi-Supervised Framework with Multi-Task and Curriculum Learning for Medical Image Segmentation.

Wang, Kaiping; Wang, Yan; Zhan, Bo; Yang, Yujie; Zu, Chen; Wu, Xi; Zhou, Jiliu; Nie, Dong; Zhou, Luping.

Int J Neural Syst ; 32(9): 2250043, 2022 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-35912583

RESUMEN

A practical problem in supervised deep learning for medical image segmentation is the lack of labeled data which is expensive and time-consuming to acquire. In contrast, there is a considerable amount of unlabeled data available in the clinic. To make better use of the unlabeled data and improve the generalization on limited labeled data, in this paper, a novel semi-supervised segmentation method via multi-task curriculum learning is presented. Here, curriculum learning means that when training the network, simpler knowledge is preferentially learned to assist the learning of more difficult knowledge. Concretely, our framework consists of a main segmentation task and two auxiliary tasks, i.e. the feature regression task and target detection task. The two auxiliary tasks predict some relatively simpler image-level attributes and bounding boxes as the pseudo labels for the main segmentation task, enforcing the pixel-level segmentation result to match the distribution of these pseudo labels. In addition, to solve the problem of class imbalance in the images, a bounding-box-based attention (BBA) module is embedded, enabling the segmentation network to concern more about the target region rather than the background. Furthermore, to alleviate the adverse effects caused by the possible deviation of pseudo labels, error tolerance mechanisms are also adopted in the auxiliary tasks, including inequality constraint and bounding-box amplification. Our method is validated on ACDC2017 and PROMISE12 datasets. Experimental results demonstrate that compared with the full supervision method and state-of-the-art semi-supervised methods, our method yields a much better segmentation performance on a small labeled dataset. Code is available at https://github.com/DeepMedLab/MTCL.

Asunto(s)

Curriculum , Aprendizaje Automático Supervisado , Curaduría de Datos/métodos , Curaduría de Datos/normas , Conjuntos de Datos como Asunto/normas , Conjuntos de Datos como Asunto/provisión & distribución , Procesamiento de Imagen Asistido por Computador/métodos , Aprendizaje Automático Supervisado/clasificación , Aprendizaje Automático Supervisado/estadística & datos numéricos , Aprendizaje Automático Supervisado/tendencias

6.

Dataset for Pathology Reporting of Colorectal Cancer: Recommendations From the International Collaboration on Cancer Reporting (ICCR).

Loughrey, Maurice B; Webster, Fleur; Arends, Mark J; Brown, Ian; Burgart, Lawrence J; Cunningham, Chris; Flejou, Jean-Francois; Kakar, Sanjay; Kirsch, Richard; Kojima, Motohiro; Lugli, Alessandro; Rosty, Christophe; Sheahan, Kieran; West, Nicholas P; Wilson, Richard H; Nagtegaal, Iris D.

Ann Surg ; 275(3): e549-e561, 2022 03 01.

Artículo en Inglés | MEDLINE | ID: mdl-34238814

RESUMEN

OBJECTIVE: The aim of this study to describe a new international dataset for pathology reporting of colorectal cancer surgical specimens, produced under the auspices of the International Collaboration on Cancer Reporting (ICCR). BACKGROUND: Quality of pathology reporting and mutual understanding between colorectal surgeon, pathologist and oncologist are vital to patient management. Some pathology parameters are prone to variable interpretation, resulting in differing positions adopted by existing national datasets. METHODS: The ICCR, a global alliance of major pathology institutions with links to international cancer organizations, has developed and ratified a rigorous and efficient process for the development of evidence-based, structured datasets for pathology reporting of common cancers. Here we describe the production of a dataset for colorectal cancer resection specimens by a multidisciplinary panel of internationally recognized experts. RESULTS: The agreed dataset comprises eighteen core (essential) and seven non-core (recommended) elements identified from a review of current evidence. Areas of contention are addressed, some highly relevant to surgical practice, with the aim of standardizing multidisciplinary discussion. The summation of all core elements is considered to be the minimum reporting standard for individual cases. Commentary is provided, explaining each element's clinical relevance, definitions to be applied where appropriate for the agreed list of value options and the rationale for considering the element as core or non-core. CONCLUSIONS: This first internationally agreed dataset for colorectal cancer pathology reporting promotes standardization of pathology reporting and enhanced clinicopathological communication. Widespread adoption will facilitate international comparisons, multinational clinical trials and help to improve the management of colorectal cancer globally.

Asunto(s)

Neoplasias Colorrectales/patología , Conjuntos de Datos como Asunto/normas , Proyectos de Investigación , Humanos

7.

Mapping single-cell data to reference atlases by transfer learning.

Lotfollahi, Mohammad; Naghipourfar, Mohsen; Luecken, Malte D; Khajavi, Matin; Büttner, Maren; Wagenstetter, Marco; Avsec, Ziga; Gayoso, Adam; Yosef, Nir; Interlandi, Marta; Rybakov, Sergei; Misharin, Alexander V; Theis, Fabian J.

Nat Biotechnol ; 40(1): 121-130, 2022 01.

Artículo en Inglés | MEDLINE | ID: mdl-34462589

RESUMEN

Large single-cell atlases are now routinely generated to serve as references for analysis of smaller-scale studies. Yet learning from reference data is complicated by batch effects between datasets, limited availability of computational resources and sharing restrictions on raw data. Here we introduce a deep learning strategy for mapping query datasets on top of a reference called single-cell architectural surgery (scArches). scArches uses transfer learning and parameter optimization to enable efficient, decentralized, iterative reference building and contextualization of new datasets with existing references without sharing raw data. Using examples from mouse brain, pancreas, immune and whole-organism atlases, we show that scArches preserves biological state information while removing batch effects, despite using four orders of magnitude fewer parameters than de novo integration. scArches generalizes to multimodal reference mapping, allowing imputation of missing modalities. Finally, scArches retains coronavirus disease 2019 (COVID-19) disease variation when mapping to a healthy reference, enabling the discovery of disease-specific cell states. scArches will facilitate collaborative projects by enabling iterative construction, updating, sharing and efficient use of reference atlases.

Asunto(s)

Conjuntos de Datos como Asunto/normas , Aprendizaje Profundo , Especificidad de Órganos , Análisis de la Célula Individual/normas , Animales , COVID-19/patología , Humanos , Ratones , Estándares de Referencia , SARS-CoV-2/patogenicidad

8.

Harmonizing neuropsychological assessment for mild neurocognitive disorders in Europe.

Boccardi, Marina; Monsch, Andreas U; Ferrari, Clarissa; Altomare, Daniele; Berres, Manfred; Bos, Isabelle; Buchmann, Andreas; Cerami, Chiara; Didic, Mira; Festari, Cristina; Nicolosi, Valentina; Sacco, Leonardo; Aerts, Liesbeth; Albanese, Emiliano; Annoni, Jean-Marie; Ballhausen, Nicola; Chicherio, Christian; Démonet, Jean-François; Descloux, Virginie; Diener, Suzie; Ferreira, Daniel; Georges, Jean; Gietl, Anton; Girtler, Nicola; Kilimann, Ingo; Klöppel, Stefan; Kustyniuk, Nicole; Mecocci, Patrizia; Mella, Nathalie; Pigliautile, Martina; Seeher, Katrin; Shirk, Steven D; Toraldo, Alessio; Brioschi-Guevara, Andrea; Chan, Kwun C G; Crane, Paul K; Dodich, Alessandra; Grazia, Alice; Kochan, Nicole A; de Oliveira, Fabricio Ferreira; Nobili, Flavio; Kukull, Walter; Peters, Oliver; Ramakers, Inez; Sachdev, Perminder S; Teipel, Stefan; Visser, Pieter Jelle; Wagner, Michael; Weintraub, Sandra; Westman, Eric.

Alzheimers Dement ; 18(1): 29-42, 2022 01.

Artículo en Inglés | MEDLINE | ID: mdl-33984176

RESUMEN

INTRODUCTION: Harmonized neuropsychological assessment for neurocognitive disorders, an international priority for valid and reliable diagnostic procedures, has been achieved only in specific countries or research contexts. METHODS: To harmonize the assessment of mild cognitive impairment in Europe, a workshop (Geneva, May 2018) convened stakeholders, methodologists, academic, and non-academic clinicians and experts from European, US, and Australian harmonization initiatives. RESULTS: With formal presentations and thematic working-groups we defined a standard battery consistent with the U.S. Uniform DataSet, version 3, and homogeneous methodology to obtain consistent normative data across tests and languages. Adaptations consist of including two tests specific to typical Alzheimer's disease and behavioral variant frontotemporal dementia. The methodology for harmonized normative data includes consensus definition of cognitively normal controls, classification of confounding factors (age, sex, and education), and calculation of minimum sample sizes. DISCUSSION: This expert consensus allows harmonizing the diagnosis of neurocognitive disorders across European countries and possibly beyond.

Asunto(s)

Disfunción Cognitiva , Conferencias de Consenso como Asunto , Conjuntos de Datos como Asunto/normas , Pruebas Neuropsicológicas/normas , Factores de Edad , Cognición , Disfunción Cognitiva/clasificación , Disfunción Cognitiva/diagnóstico , Escolaridad , Europa (Continente) , Testimonio de Experto , Humanos , Lenguaje , Factores Sexuales

9.

Leveraging Semantic Type Dependencies for Clinical Named Entity Recognition.

Le, Linh; Zuccon, Guido; Demartini, Gianluca; Zhao, Genghong; Zhang, Xia.

AMIA Annu Symp Proc ; 2022: 662-671, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-37128396

RESUMEN

Previous work on clinical relation extraction from free-text sentences leveraged information about semantic types from clinical knowledge bases as a part of entity representations. In this paper, we exploit additional evidence by also making use of domain-specific semantic type dependencies. We encode the relation between a span of tokens matching a Unified Medical Language System (UMLS) concept and other tokens in the sentence. We implement our method and compare against different named entity recognition (NER) architectures (i.e., BiLSTM-CRF and BiLSTM-GCN-CRF) using different pre-trained clinical embeddings (i.e., BERT, BioBERT, UMLSBert). Our experimental results on clinical datasets show that in some cases NER effectiveness can be significantly improved by making use of domain-specific semantic type dependencies. Our work is also the first study generating a matrix encoding to make use of more than three dependencies in one pass for the NER task.

Asunto(s)

Procesamiento de Lenguaje Natural , Semántica , Unified Medical Language System , Humanos , Bases del Conocimiento , Conjuntos de Datos como Asunto/normas , Tamaño de la Muestra , Reproducibilidad de los Resultados

10.

An Integrated Toolkit for Extensible and Reproducible Neuroscience.

Matelsky, Jordan K; Rodriguez, Luis M; Xenes, Daniel; Gion, Timothy; Hider, Robert; Wester, Brock A; Gray-Roncal, William.

Annu Int Conf IEEE Eng Med Biol Soc ; 2021: 2413-2418, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34891768

RESUMEN

As neuroimagery datasets continue to grow in size, the complexity of data analyses can require a detailed understanding and implementation of systems computer science for storage, access, processing, and sharing. Currently, several general data standards (e.g., Zarr, HDF5, precomputed) and purpose-built ecosystems (e.g., BossDB, CloudVolume, DVID, and Knossos) exist. Each of these systems has advantages and limitations and is most appropriate for different use cases. Using datasets that don't fit into RAM in this heterogeneous environment is challenging, and significant barriers exist to leverage underlying research investments. In this manuscript, we outline our perspective for how to approach this challenge through the use of community provided, standardized interfaces that unify various computational backends and abstract computer science challenges from the scientist. We introduce desirable design patterns and share our reference implementation called intern.

Asunto(s)

Conjuntos de Datos como Asunto/normas , Neurociencias

11.

Unrepresentative big surveys significantly overestimated US vaccine uptake.

Bradley, Valerie C; Kuriwaki, Shiro; Isakov, Michael; Sejdinovic, Dino; Meng, Xiao-Li; Flaxman, Seth.

Nature ; 600(7890): 695-700, 2021 12.

Artículo en Inglés | MEDLINE | ID: mdl-34880504

RESUMEN

Surveys are a crucial tool for understanding public opinion and behaviour, and their accuracy depends on maintaining statistical representativeness of their target populations by minimizing biases from all sources. Increasing data size shrinks confidence intervals but magnifies the effect of survey bias: an instance of the Big Data Paradox1. Here we demonstrate this paradox in estimates of first-dose COVID-19 vaccine uptake in US adults from 9 January to 19 May 2021 from two large surveys: Delphi-Facebook2,3 (about 250,000 responses per week) and Census Household Pulse4 (about 75,000 every two weeks). In May 2021, Delphi-Facebook overestimated uptake by 17 percentage points (14-20 percentage points with 5% benchmark imprecision) and Census Household Pulse by 14 (11-17 percentage points with 5% benchmark imprecision), compared to a retroactively updated benchmark the Centers for Disease Control and Prevention published on 26 May 2021. Moreover, their large sample sizes led to miniscule margins of error on the incorrect estimates. By contrast, an Axios-Ipsos online panel5 with about 1,000 responses per week following survey research best practices6 provided reliable estimates and uncertainty quantification. We decompose observed error using a recent analytic framework1 to explain the inaccuracy in the three surveys. We then analyse the implications for vaccine hesitancy and willingness. We show how a survey of 250,000 respondents can produce an estimate of the population mean that is no more accurate than an estimate from a simple random sample of size 10. Our central message is that data quality matters more than data quantity, and that compensating the former with the latter is a mathematically provable losing proposition.

Asunto(s)

Vacunas contra la COVID-19/administración & dosificación , Encuestas de Atención de la Salud , Vacunación/estadística & datos numéricos , Benchmarking , Sesgo , Macrodatos , COVID-19/epidemiología , COVID-19/prevención & control , Centers for Disease Control and Prevention, U.S. , Conjuntos de Datos como Asunto/normas , Femenino , Encuestas de Atención de la Salud/normas , Humanos , Masculino , Proyectos de Investigación , Tamaño de la Muestra , Medios de Comunicación Sociales , Estados Unidos/epidemiología , Vacilación a la Vacunación/estadística & datos numéricos

12.

Silver: Forging almost Gold Standard Datasets.

Maleki, Farhad; Ovens, Katie; McQuillan, Ian; Kusalik, Anthony J.

Genes (Basel) ; 12(10)2021 09 28.

Artículo en Inglés | MEDLINE | ID: mdl-34680918

RESUMEN

Gene set analysis has been widely used to gain insight from high-throughput expression studies. Although various tools and methods have been developed for gene set analysis, there is no consensus among researchers regarding best practice(s). Most often, evaluation studies have reported contradictory recommendations of which methods are superior. Therefore, an unbiased quantitative framework for evaluations of gene set analysis methods will be valuable. Such a framework requires gene expression datasets where enrichment status of gene sets is known a priori. In the absence of such gold standard datasets, artificial datasets are commonly used for evaluations of gene set analysis methods; however, they often rely on oversimplifying assumptions that make them biased in favor of or against a given method. In this paper, we propose a quantitative framework for evaluation of gene set analysis methods by synthesizing expression datasets using real data, without relying on oversimplifying or unrealistic assumptions, while preserving complex gene-gene correlations and retaining the distribution of expression values. The utility of the quantitative approach is shown by evaluating ten widely used gene set analysis methods. An implementation of the proposed method is publicly available. We suggest using Silver to evaluate existing and new gene set analysis methods. Evaluation using Silver provides a better understanding of current methods and can aid in the development of gene set analysis methods to achieve higher specificity without sacrificing sensitivity.

Asunto(s)

Bases de Datos Genéticas/normas , Genómica/métodos , Programas Informáticos , Conjuntos de Datos como Asunto/normas

13.

The class imbalance problem.

Megahed, Fadel M; Chen, Ying-Ju; Megahed, Aly; Ong, Yuya; Altman, Naomi; Krzywinski, Martin.

Nat Methods ; 18(11): 1270-1272, 2021 11.

Artículo en Inglés | MEDLINE | ID: mdl-34654918

Asunto(s)

Algoritmos , Bases de Datos Factuales/normas , Conjuntos de Datos como Asunto/normas , Modelos Teóricos , Humanos

14.

Multi-EPL: Accurate multi-source domain adaptation.

Lee, Seongmin; Jeon, Hyunsik; Kang, U.

PLoS One ; 16(8): e0255754, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34352030

RESUMEN

Given multiple source datasets with labels, how can we train a target model with no labeled data? Multi-source domain adaptation (MSDA) aims to train a model using multiple source datasets different from a target dataset in the absence of target data labels. MSDA is a crucial problem applicable to many practical cases where labels for the target data are unavailable due to privacy issues. Existing MSDA frameworks are limited since they align data without considering labels of the features of each domain. They also do not fully utilize the target data without labels and rely on limited feature extraction with a single extractor. In this paper, we propose Multi-EPL, a novel method for MSDA. Multi-EPL exploits label-wise moment matching to align the conditional distributions of the features for the labels, uses pseudolabels for the unavailable target labels, and introduces an ensemble of multiple feature extractors for accurate domain adaptation. Extensive experiments show that Multi-EPL provides the state-of-the-art performance for MSDA tasks in both image domains and text domains, improving the accuracy by up to 13.20%.

Asunto(s)

Sistemas de Administración de Bases de Datos/normas , Aprendizaje Profundo , Conjuntos de Datos como Asunto/normas

15.

Making data meaningful: guidelines for good quality open data.

Towse, Andrea S; Ellis, David A; Towse, John N.

J Soc Psychol ; 161(4): 395-402, 2021 07 04.

Artículo en Inglés | MEDLINE | ID: mdl-34292132

Asunto(s)

Conjuntos de Datos como Asunto/normas , Guías como Asunto , Difusión de la Información/métodos , Publicación de Acceso Abierto/normas

16.

Highly accurate protein structure prediction for the human proteome.

Tunyasuvunakool, Kathryn; Adler, Jonas; Wu, Zachary; Green, Tim; Zielinski, Michal; Zídek, Augustin; Bridgland, Alex; Cowie, Andrew; Meyer, Clemens; Laydon, Agata; Velankar, Sameer; Kleywegt, Gerard J; Bateman, Alex; Evans, Richard; Pritzel, Alexander; Figurnov, Michael; Ronneberger, Olaf; Bates, Russ; Kohl, Simon A A; Potapenko, Anna; Ballard, Andrew J; Romera-Paredes, Bernardino; Nikolov, Stanislav; Jain, Rishub; Clancy, Ellen; Reiman, David; Petersen, Stig; Senior, Andrew W; Kavukcuoglu, Koray; Birney, Ewan; Kohli, Pushmeet; Jumper, John; Hassabis, Demis.

Nature ; 596(7873): 590-596, 2021 08.

Artículo en Inglés | MEDLINE | ID: mdl-34293799

RESUMEN

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.

Asunto(s)

Biología Computacional/normas , Aprendizaje Profundo/normas , Modelos Moleculares , Conformación Proteica , Proteoma/química , Conjuntos de Datos como Asunto/normas , Diacilglicerol O-Acetiltransferasa/química , Glucosa-6-Fosfatasa/química , Humanos , Proteínas de la Membrana/química , Pliegue de Proteína , Reproducibilidad de los Resultados

17.

Probabilistic linkage without personal information successfully linked national clinical datasets.

Blake, Helen A; Sharples, Linda D; Harron, Katie; van der Meulen, Jan H; Walker, Kate.

J Clin Epidemiol ; 136: 136-145, 2021 08.

Artículo en Inglés | MEDLINE | ID: mdl-33932483

RESUMEN

BACKGROUND: Probabilistic linkage can link patients from different clinical databases without the need for personal information. If accurate linkage can be achieved, it would accelerate the use of linked datasets to address important clinical and public health questions. OBJECTIVE: We developed a step-by-step process for probabilistic linkage of national clinical and administrative datasets without personal information, and validated it against deterministic linkage using patient identifiers. STUDY DESIGN AND SETTING: We used electronic health records from the National Bowel Cancer Audit and Hospital Episode Statistics databases for 10,566 bowel cancer patients undergoing emergency surgery in the English National Health Service. RESULTS: Probabilistic linkage linked 81.4% of National Bowel Cancer Audit records to Hospital Episode Statistics, vs. 82.8% using deterministic linkage. No systematic differences were seen between patients that were and were not linked, and regression models for mortality and length of hospital stay according to patient and tumour characteristics were not sensitive to the linkage approach. CONCLUSION: Probabilistic linkage was successful in linking national clinical and administrative datasets for patients undergoing a major surgical procedure. It allows analysts outside highly secure data environments to undertake linkage while minimizing costs and delays, protecting data security, and maintaining linkage quality.

Asunto(s)

Manejo de Datos/métodos , Manejo de Datos/estadística & datos numéricos , Conjuntos de Datos como Asunto/normas , Registros Electrónicos de Salud/estadística & datos numéricos , Registros Electrónicos de Salud/normas , Neoplasias Intestinales/epidemiología , Registro Médico Coordinado/métodos , Conjuntos de Datos como Asunto/estadística & datos numéricos , Humanos , Neoplasias Intestinales/mortalidad , Neoplasias Intestinales/cirugía , Modelos Estadísticos , Reproducibilidad de los Resultados , Medicina Estatal , Reino Unido

18.

Dataset for the reporting of carcinoma of the esophagus in resection specimens: recommendations from the International Collaboration on Cancer Reporting.

Lam, Alfred K; Bourke, Michael J; Chen, Renyin; Fiocca, Roberto; Fujishima, Fumiyoshi; Fujii, Satoshi; Jansen, Marnix; Kumarasinghe, Priyanthi; Langer, Rupert; Law, Simon; Meijer, Sybren L; Muldoon, Cian; Novelli, Marco; Shi, Chanjuan; Tang, Laura; Nagtegaal, Iris D.

Hum Pathol ; 114: 54-65, 2021 08.

Artículo en Inglés | MEDLINE | ID: mdl-33992659

RESUMEN

BACKGROUND AND OBJECTIVES: A standardized data set for esophageal carcinoma pathology reporting was developed based on the approach of the International Collaboration on Cancer Reporting (ICCR) for the purpose of improving cancer patient outcomes and international benchmarking in cancer management. MATERIALS AND METHODS: The ICCR convened a multidisciplinary international expert panel to identify the best evidence-based clinical and pathological parameters for inclusion in the data set for esophageal carcinoma. The data set incorporated the current edition of the World Health Organization Classification of Tumours of the Digestive System, and Tumour-Node-Metastasis staging systems. RESULTS: The scope of the data set encompassed resection specimens of the esophagus and esophagogastric junction with tumor epicenter ≤20 mm into the proximal stomach. Core reporting elements included information on neoadjuvant therapy, operative procedure used, tumor focality, tumor site, tumor dimensions, distance of tumor to resection margins, histological tumor type, presence and type of dysplasia, tumor grade, extent of invasion in the esophagus, lymphovascular invasion, response to neoadjuvant therapy, status of resection margin, ancillary studies, lymph node status, distant metastases, and pathological staging. Additional non-core elements considered useful to report included clinical information, specimen dimensions, macroscopic appearance of tumor, and coexistent pathology. CONCLUSIONS: This is the first international peer-reviewed structured reporting data set for surgically resected specimens of the esophagus. The ICCR carcinoma of the esophagus data set is recommended for routine use globally and is a valuable tool to support standardized reporting, to benefit patient care by providing diagnostic and prognostic best-practice parameters.

Asunto(s)

Carcinoma/cirugía , Conjuntos de Datos como Asunto/normas , Neoplasias Esofágicas/cirugía , Esofagectomía , Unión Esofagogástrica/cirugía , Proyectos de Investigación/normas , Neoplasias Gástricas/cirugía , Benchmarking/normas , Carcinoma/secundario , Quimioradioterapia Adyuvante , Conducta Cooperativa , Exactitud de los Datos , Neoplasias Esofágicas/patología , Unión Esofagogástrica/patología , Medicina Basada en la Evidencia/normas , Humanos , Cooperación Internacional , Terapia Neoadyuvante , Clasificación del Tumor , Estadificación de Neoplasias , Neoplasias Gástricas/patología , Resultado del Tratamiento

19.

Stability analysis of clustering of Norris' visual analogue scale: Applying the consensus clustering approach.

Guan, Zheng; Chen, X Gregory; Hay, Justin; van Gerven, Joop; Burggraaf, Jacobus; de Kam, Marieke.

Medicine (Baltimore) ; 100(17): e25363, 2021 Apr 30.

Artículo en Inglés | MEDLINE | ID: mdl-33907093

RESUMEN

ABSTRACT: Visual analogue scales are widely used to measure subjective responses. Norris' 16 visual analogue scales (N_VAS) measure subjective feelings of alertness and mood. Up to now, different scientists have clustered items of N_VAS into different ways and Bond and Lader's way has been the most frequently used in clinical research. However, there are concerns about the stability of this clustering over different subject samples and different drug classes. The aim of this study was to test whether Bond and Lader's clustering was stable in terms of subject samples and drug effects. Alternative clustering of N_VAS was tested.Data from studies with 3 types of drugs: cannabinoid receptor agonist (delta-9-tetrahydrocannabinol [THC]), muscarinic antagonist (scopolamine), and benzodiazepines (midazolam and lorazepam), collected between 2005 and 2012, were used for this analysis. Exploratory factor analysis (EFA) was used to test the clustering algorithm of Bond and Lader. Consensus clustering was performed to test the stability of clustering results over samples and over different drug types. Stability analysis was performed using a three-cluster assumption, and then on other alternative assumptions.Heat maps of the consensus matrix (CM) and density plots showed instability of the three-cluster hypothesis and suggested instability over the 3 drug classes. Two- and four-cluster hypothesis were also tested. Heat maps of the CM and density plots suggested that the two-cluster assumption was superior.In summary, the two-cluster assumption leads to a provably stable outcome over samples and the 3 drug types based on the data used.

Asunto(s)

Análisis por Conglomerados , Interpretación Estadística de Datos , Conjuntos de Datos como Asunto/normas , Dimensión del Dolor/métodos , Escala Visual Analógica , Adulto , Algoritmos , Benzodiazepinas/uso terapéutico , Agonistas de Receptores de Cannabinoides/uso terapéutico , Consenso , Estudios Cruzados , Método Doble Ciego , Análisis Factorial , Humanos , Masculino , Antagonistas Muscarínicos/uso terapéutico , Dimensión del Dolor/normas , Ensayos Clínicos Controlados Aleatorios como Asunto , Reproducibilidad de los Resultados

20.

A noisy label and negative sample robust loss function for DNN-based distant supervised relation extraction.

Deng, Lihui; Yang, Bo; Kang, Zhongfeng; Yang, Shantian; Wu, Shihu.

Neural Netw ; 139: 358-370, 2021 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-33901772

RESUMEN

As a major method for relation extraction, distantly supervised relation extraction (DSRE) suffered from the noisy label problem and class imbalance problem (these two problems are also common for many other NLP tasks, e.g., text classification). However, there seems no existing research in DSRE or other NLP tasks that can simultaneously solve both problems, which is a significant insufficiency in related researches. In this paper, we propose a loss function which is robust to noisy label and efficient for the imbalanced class dataset. More specific, first we quantify the negative impacts of the noisy label and class imbalance problems. And then we construct a loss function that can minimize these negative impacts through a linear programming method. As far as we know, this seems to be the first attempt to address the noisy label problem and class imbalance problem simultaneously. We evaluated the constructed loss function on the distantly labeled dataset, our artificially noised dataset, human-annotated dataset of Docred, as well as the artificially noised dataset of CoNLL 2003. Experimental results indicate that a DNN model adopting the constructed loss function can outperform other models that adopt the state-of-the-art noisy label robust or negative sample robust loss functions.

Asunto(s)

Aprendizaje Automático Supervisado , Conjuntos de Datos como Asunto/normas , Relación Señal-Ruido

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA