Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 901
Filter
1.
J Chem Inf Model ; 64(13): 5041-5051, 2024 Jul 08.
Article in English | MEDLINE | ID: mdl-38907989

ABSTRACT

Proteins interact through their interfaces, and dysfunction of protein-protein interactions (PPIs) has been associated with various diseases. Therefore, investigating the properties of the drug-modulated PPIs and interface-targeting drugs is critical. Here, we present a curated large data set for drug-like molecules in protein interfaces. We further introduce DiPPI (Drugs in Protein-Protein Interfaces), a two-module web site to facilitate the search for such molecules and their properties by exploiting our data set in drug repurposing studies. In the interface module of the web site, we present several properties, of interfaces, such as amino acid properties, hotspots, evolutionary conservation of drug-binding amino acids, and post-translational modifications of these residues. On the drug-like molecule side, we list drug-like small molecules and FDA-approved drugs from various databases and highlight those that bind to the interfaces. We further clustered the drugs based on their molecular fingerprints to confine the search for an alternative drug to a smaller space. Drug properties, including Lipinski's rules and various molecular descriptors, are also calculated and made available on the web site to guide the selection of drug molecules. Our data set contains 534,203 interfaces for 98,632 protein structures, of which 55,135 are detected to bind to a drug-like molecule. 2214 drug-like molecules are deposited on our web site, among which 335 are FDA-approved. DiPPI provides users with an easy-to-follow scheme for drug repurposing studies through its well-curated and clustered interface and drug data and is freely available at http://interactome.ku.edu.tr:8501.


Subject(s)
Proteins , Proteins/chemistry , Proteins/metabolism , Pharmaceutical Preparations/chemistry , Pharmaceutical Preparations/metabolism , Protein Binding , Drug Repositioning , Databases, Protein , Humans , Data Curation , Protein Interaction Mapping/methods
2.
Database (Oxford) ; 20242024 Jun 12.
Article in English | MEDLINE | ID: mdl-38865432

ABSTRACT

Alzheimer's disease (AD) is a universal neurodegenerative disease with the feature of progressive dementia. Currently, there are only seven Food and Drug Administration-approved drugs for the treatment of AD, which merely offer temporary relief from symptom deterioration without reversing the underlying disease process. The identification of inhibitors capable of interacting with proteins associated with AD plays a pivotal role in the development of effective therapeutic interventions. However, a vast number of such inhibitors are dispersed throughout numerous published articles, rendering it inconvenient for researchers to explore potential drug candidates for AD. In light of this, we have manually compiled inhibitors targeting proteins associated with AD and constructed a comprehensive database known as IPAD-DB (Inhibitors of Proteins associated with Alzheimer's Disease Database). The curated inhibitors within this database encompass a diverse range of compounds, including natural compounds, synthetic compounds, drugs, natural extracts and nano-inhibitors. To date, the database has compiled >4800 entries, each representing a correspondent relationship between an inhibitor and its target protein. IPAD-DB offers a user-friendly interface that facilitates browsing, searching and downloading of its records. We firmly believe that IPAD-DB represents a valuable resource for screening potential AD drug candidates and investigating the underlying mechanisms of this debilitating disease. Access to IPAD-DB is freely available at http://www.lamee.cn/ipad-db/ and is compatible with all major web browsers. Database URL: http://www.lamee.cn/ipad-db/.


Subject(s)
Alzheimer Disease , Alzheimer Disease/drug therapy , Alzheimer Disease/metabolism , Humans , Databases, Protein , Data Curation/methods , User-Computer Interface
3.
PLoS One ; 19(6): e0301171, 2024.
Article in English | MEDLINE | ID: mdl-38875230

ABSTRACT

Data curators play an important role in assessing data quality and take actions that may ultimately lead to better, more valuable data products. This study explores the curation practices of data curators working within US-based data repositories. We performed a survey in January 2021 to benchmark the levels of curation performed by repositories and assess the perceived value and impact of curation on the data sharing process. Our analysis included 95 responses from 59 unique data repositories. Respondents primarily were professionals working within repositories and examined curation performed within a repository setting. A majority 72.6% of respondents reported that "data-level" curation was performed by their repository and around half reported their repository took steps to ensure interoperability and reproducibility of their repository's datasets. Curation actions most frequently reported include checking for duplicate files, reviewing documentation, reviewing metadata, minting persistent identifiers, and checking for corrupt/broken files. The most "value-add" curation action across generalist, institutional, and disciplinary repository respondents was related to reviewing and enhancing documentation. Respondents reported high perceived impact of curation by their repositories on specific data sharing outcomes including usability, findability, understandability, and accessibility of deposited datasets; respondents associated with disciplinary repositories tended to perceive higher impact on most outcomes. Most survey participants strongly agreed that data curation by the repository adds value to the data sharing process and that it outweighs the effort and cost. We found some differences between institutional and disciplinary repositories, both in the reported frequency of specific curation actions as well as the perceived impact of data curation. Interestingly, we also found variation in the perceptions of those working within the same repository regarding the level and frequency of curation actions performed, which exemplifies the complexity of a repository curation work. Our results suggest data curation may be better understood in terms of specific curation actions and outcomes than broadly defined curation levels and that more research is needed to understand the resource implications of performing these activities. We share these results to provide a more nuanced view of curation, and how curation impacts the broader data lifecycle and data sharing behaviors.


Subject(s)
Data Curation , Humans , Surveys and Questionnaires , United States , Information Dissemination , Data Accuracy , Databases, Factual , Reproducibility of Results
4.
J Am Med Inform Assoc ; 31(7): 1463-1470, 2024 Jun 20.
Article in English | MEDLINE | ID: mdl-38722233

ABSTRACT

OBJECTIVE: ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics). MATERIALS AND METHODS: Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata. RESULTS: SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations. DISCUSSION: Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc. CONCLUSION: Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.


Subject(s)
Computational Biology , Neurosciences , Computational Biology/methods , Humans , Metadata , Data Curation/methods , Models, Neurological , Data Mining/methods , Databases, Factual
5.
Database (Oxford) ; 20242024 May 15.
Article in English | MEDLINE | ID: mdl-38748636

ABSTRACT

Breast cancer is notorious for its high mortality and heterogeneity, resulting in different therapeutic responses. Classical biomarkers have been identified and successfully commercially applied to predict the outcome of breast cancer patients. Accumulating biomarkers, including non-coding RNAs, have been reported as prognostic markers for breast cancer with the development of sequencing techniques. However, there are currently no databases dedicated to the curation and characterization of prognostic markers for breast cancer. Therefore, we constructed a curated database for prognostic markers of breast cancer (PMBC). PMBC consists of 1070 markers covering mRNAs, lncRNAs, miRNAs and circRNAs. These markers are enriched in various cancer- and epithelial-related functions including mitogen-activated protein kinases signaling. We mapped the prognostic markers into the ceRNA network from starBase. The lncRNA NEAT1 competes with 11 RNAs, including lncRNAs and mRNAs. The majority of the ceRNAs in ABAT belong to pseudogenes. The topology analysis of the ceRNA network reveals that known prognostic RNAs have higher closeness than random. Among all the biomarkers, prognostic lncRNAs have a higher degree, while prognostic mRNAs have significantly higher closeness than random RNAs. These results indicate that the lncRNAs play important roles in maintaining the interactions between lncRNAs and their ceRNAs, which might be used as a characteristic to prioritize prognostic lncRNAs based on the ceRNA network. PMBC renders a user-friendly interface and provides detailed information about individual prognostic markers, which will facilitate the precision treatment of breast cancer. PMBC is available at the following URL: http://www.pmbreastcancer.com/.


Subject(s)
Biomarkers, Tumor , Breast Neoplasms , Databases, Genetic , Humans , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Female , Biomarkers, Tumor/genetics , Prognosis , RNA, Long Noncoding/genetics , Gene Regulatory Networks , Data Curation/methods , RNA, Messenger/genetics , RNA, Messenger/metabolism , Gene Expression Regulation, Neoplastic
6.
Database (Oxford) ; 20242024 May 28.
Article in English | MEDLINE | ID: mdl-38805753

ABSTRACT

While biomedical relation extraction (bioRE) datasets have been instrumental in the development of methods to support biocuration of single variants from texts, no datasets are currently available for the extraction of digenic or even oligogenic variant relations, despite the reports in literature that epistatic effects between combinations of variants in different loci (or genes) are important to understand disease etiologies. This work presents the creation of a unique dataset of oligogenic variant combinations, geared to train tools to help in the curation of scientific literature. To overcome the hurdles associated with the number of unlabelled instances and the cost of expertise, active learning (AL) was used to optimize the annotation, thus getting assistance in finding the most informative subset of samples to label. By pre-annotating 85 full-text articles containing the relevant relations from the Oligogenic Diseases Database (OLIDA) with PubTator, text fragments featuring potential digenic variant combinations, i.e. gene-variant-gene-variant, were extracted. The resulting fragments of texts were annotated with ALAMBIC, an AL-based annotation platform. The resulting dataset, called DUVEL, is used to fine-tune four state-of-the-art biomedical language models: BiomedBERT, BiomedBERT-large, BioLinkBERT and BioM-BERT. More than 500 000 text fragments were considered for annotation, finally resulting in a dataset with 8442 fragments, 794 of them being positive instances, covering 95% of the original annotated articles. When applied to gene-variant pair detection, BiomedBERT-large achieves the highest F1 score (0.84) after fine-tuning, demonstrating significant improvement compared to the non-fine-tuned model, underlining the relevance of the DUVEL dataset. This study shows how AL may play an important role in the creation of bioRE dataset relevant for biomedical curation applications. DUVEL provides a unique biomedical corpus focusing on 4-ary relations between two genes and two variants. It is made freely available for research on GitHub and Hugging Face. Database URL: https://huggingface.co/datasets/cnachteg/duvel or https://doi.org/10.57967/hf/1571.


Subject(s)
Supervised Machine Learning , Humans , Data Mining/methods , Data Curation/methods , Databases, Genetic
7.
BMC Bioinformatics ; 25(1): 184, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38724907

ABSTRACT

BACKGROUND: Major advances in sequencing technologies and the sharing of data and metadata in science have resulted in a wealth of publicly available datasets. However, working with and especially curating public omics datasets remains challenging despite these efforts. While a growing number of initiatives aim to re-use previous results, these present limitations that often lead to the need for further in-house curation and processing. RESULTS: Here, we present the Omics Dataset Curation Toolkit (OMD Curation Toolkit), a python3 package designed to accompany and guide the researcher during the curation process of metadata and fastq files of public omics datasets. This workflow provides a standardized framework with multiple capabilities (collection, control check, treatment and integration) to facilitate the arduous task of curating public sequencing data projects. While centered on the European Nucleotide Archive (ENA), the majority of the provided tools are generic and can be used to curate datasets from different sources. CONCLUSIONS: Thus, it offers valuable tools for the in-house curation previously needed to re-use public omics data. Due to its workflow structure and capabilities, it can be easily used and benefit investigators in developing novel omics meta-analyses based on sequencing data.


Subject(s)
Data Curation , Software , Workflow , Data Curation/methods , Metadata , Databases, Genetic , Genomics/methods , Computational Biology/methods
8.
Database (Oxford) ; 20242024 May 24.
Article in English | MEDLINE | ID: mdl-38788333

ABSTRACT

Multiple sclerosis (MS) is the most common inflammatory demyelinating disease of the central nervous system. 'Omics' technologies (genomics, transcriptomics, proteomics) and associated drug information have begun reshaping our understanding of multiple sclerosis. However, these data are scattered across numerous references, making them challenging to fully utilize. We manually mined and compiled these data within the Multiple Sclerosis Gene Database (MSGD) database, intending to continue updating it in the future. We screened 5485 publications and constructed the current version of MSGD. MSGD comprises 6255 entries, including 3274 variant entries, 1175 RNA entries, 418 protein entries, 313 knockout entries, 612 drug entries and 463 high-throughput entries. Each entry contains detailed information, such as species, disease type, detailed gene descriptions (such as official gene symbols), and original references. MSGD is freely accessible and provides a user-friendly web interface. Users can easily search for genes of interest, view their expression patterns and detailed information, manage gene sets and submit new MS-gene associations through the platform. The primary principle behind MSGD's design is to provide an exploratory platform, aiming to minimize filtration and interpretation barriers while ensuring highly accessible presentation of data. This initiative is expected to significantly assist researchers in deciphering gene mechanisms and improving the prevention, diagnosis and treatment of MS. Database URL: http://bio-bigdata.hrbmu.edu.cn/MSGD.


Subject(s)
Databases, Genetic , Multiple Sclerosis , Proteomics , Transcriptome , Multiple Sclerosis/genetics , Humans , Proteomics/methods , Transcriptome/genetics , Data Curation/methods , Genomics/methods
9.
San Salvador; MINSAL; abr. 3, 2024. 23 p. ilus.
Non-conventional in Spanish | BISSAL, LILACS | ID: biblio-1553574

ABSTRACT

Los presentes Lineamientos tiene como objetivo, estandarizar las actividades a cumplir por el personal responsable de la eliminación de documentos administrativos para volver eficiente el procedimiento y reducir costos de conservación. Están sujetos al cumplimiento del presente procedimiento, todas las dependencias del Minsal y Establecimientos de Salud que generan archivos institucionales de valor primario (administrativo), iniciando con la identificación de la documentación que ya cumplió su plazo de conservación establecido en la TPCD, concluyendo con la destrucción de la documentación


The objective of these Guidelines is to standardize the activities to be carried out by the personnel responsible for the elimination of administrative documents to make the procedure efficient and reduce conservation costs. All Minsal agencies and Health Establishments that generate institutional files of primary (administrative) value are subject to compliance with this procedure, starting with the identification of documentation that has already met its conservation period established in the TPCD, concluding with the destruction of documentation


Subject(s)
Records , Data Curation , El Salvador
10.
BMC Med Imaging ; 24(1): 83, 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38589793

ABSTRACT

The research focuses on the segmentation and classification of leukocytes, a crucial task in medical image analysis for diagnosing various diseases. The leukocyte dataset comprises four classes of images such as monocytes, lymphocytes, eosinophils, and neutrophils. Leukocyte segmentation is achieved through image processing techniques, including background subtraction, noise removal, and contouring. To get isolated leukocytes, background mask creation, Erythrocytes mask creation, and Leukocytes mask creation are performed on the blood cell images. Isolated leukocytes are then subjected to data augmentation including brightness and contrast adjustment, flipping, and random shearing, to improve the generalizability of the CNN model. A deep Convolutional Neural Network (CNN) model is employed on augmented dataset for effective feature extraction and classification. The deep CNN model consists of four convolutional blocks having eleven convolutional layers, eight batch normalization layers, eight Rectified Linear Unit (ReLU) layers, and four dropout layers to capture increasingly complex patterns. For this research, a publicly available dataset from Kaggle consisting of a total of 12,444 images of four types of leukocytes was used to conduct the experiments. Results showcase the robustness of the proposed framework, achieving impressive performance metrics with an accuracy of 97.98% and precision of 97.97%. These outcomes affirm the efficacy of the devised segmentation and classification approach in accurately identifying and categorizing leukocytes. The combination of advanced CNN architecture and meticulous pre-processing steps establishes a foundation for future developments in the field of medical image analysis.


Subject(s)
Deep Learning , Humans , Data Curation , Leukocytes , Neural Networks, Computer , Blood Cells , Image Processing, Computer-Assisted/methods
11.
Int J Comput Assist Radiol Surg ; 19(6): 1093-1101, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38573565

ABSTRACT

PURPOSE: In medical research, deep learning models rely on high-quality annotated data, a process often laborious and time-consuming. This is particularly true for detection tasks where bounding box annotations are required. The need to adjust two corners makes the process inherently frame-by-frame. Given the scarcity of experts' time, efficient annotation methods suitable for clinicians are needed. METHODS: We propose an on-the-fly method for live video annotation to enhance the annotation efficiency. In this approach, a continuous single-point annotation is maintained by keeping the cursor on the object in a live video, mitigating the need for tedious pausing and repetitive navigation inherent in traditional annotation methods. This novel annotation paradigm inherits the point annotation's ability to generate pseudo-labels using a point-to-box teacher model. We empirically evaluate this approach by developing a dataset and comparing on-the-fly annotation time against traditional annotation method. RESULTS: Using our method, annotation speed was 3.2 × faster than the traditional annotation technique. We achieved a mean improvement of 6.51 ± 0.98 AP@50 over conventional method at equivalent annotation budgets on the developed dataset. CONCLUSION: Without bells and whistles, our approach offers a significant speed-up in annotation tasks. It can be easily implemented on any annotation platform to accelerate the integration of deep learning in video-based medical research.


Subject(s)
Deep Learning , Video Recording , Video Recording/methods , Humans , Data Curation/methods
12.
PLoS One ; 19(4): e0301772, 2024.
Article in English | MEDLINE | ID: mdl-38662657

ABSTRACT

In recent years, with the trend of open science, there have been many efforts to share research data on the internet. To promote research data sharing, data curation is essential to make the data interpretable and reusable. In research fields such as life sciences, earth sciences, and social sciences, tasks and procedures have been already developed to implement efficient data curation to meet the needs and customs of individual research fields. However, not only data sharing within research fields but also interdisciplinary data sharing is required to promote open science. For this purpose, knowledge of data curation across the research fields is surveyed, analyzed, and organized as an ontology in this paper. As the survey, existing vocabularies and procedures are collected and compared as well as interviews with the data curators in research institutes in different fields are conducted to clarify commonalities and differences in data curation across the research fields. It turned out that the granularity of tasks and procedures that constitute the building blocks of data curation is not formalized. Without a method to overcome this gap, it will be challenging to promote interdisciplinary reuse of research data. Based on the analysis above, the ontology for the data curation process is proposed to describe data curation processes in different fields universally. It is described by OWL and shown as valid and consistent from the logical viewpoint. The ontology successfully represents data curation activities as the processes in the different fields acquired by the interviews. It is also helpful to identify the functions of the systems to support the data curation process. This study contributes to building a knowledge framework for an interdisciplinary understanding of data curation activities in different fields.


Subject(s)
Data Curation , Information Dissemination , Data Curation/methods , Information Dissemination/methods , Humans , Knowledge , Internet
13.
Mol Genet Metab ; 142(1): 108362, 2024 May.
Article in English | MEDLINE | ID: mdl-38452609

ABSTRACT

Cerebral creatine deficiency syndromes (CCDS) are inherited metabolic phenotypes of creatine synthesis and transport. There are two enzyme deficiencies, guanidinoacetate methyltransferase (GAMT), encoded by GAMT and arginine-glycine amidinotransferase (AGAT), encoded by GATM, which are involved in the synthesis of creatine. After synthesis, creatine is taken up by a sodium-dependent membrane bound creatine transporter (CRTR), encoded by SLC6A8, into all organs. Creatine uptake is very important especially in high energy demanding organs such as the brain, and muscle. To classify the pathogenicity of variants in GAMT, GATM, and SLC6A8, we developed the CCDS Variant Curation Expert Panel (VCEP) in 2018, supported by The Clinical Genome Resource (ClinGen), a National Institutes of Health (NIH)-funded resource. We developed disease-specific variant classification guidelines for GAMT-, GATM-, and SLC6A8-related CCDS, adapted from the American College of Medical Genetics/Association of Molecular Pathology (ACMG/AMP) variant interpretation guidelines. We applied specific variant classification guidelines to 30 pilot variants in each of the three genes that have variants associated with CCDS. Our CCDS VCEP was approved by the ClinGen Sequence Variant Interpretation Working Group (SVI WG) and Clinical Domain Oversight Committee in July 2022. We curated 181 variants including 72 variants in GAMT, 45 variants in GATM, and 64 variants in SLC6A8 and submitted these classifications to ClinVar, a public variant database supported by the National Center for Biotechnology Information. Missense variants were the most common variant type in all three genes. We submitted 32 new variants and reclassified 34 variants with conflicting interpretations. We report specific phenotype (PP4) using a points system based on the urine and plasma guanidinoacetate and creatine levels, brain magnetic resonance spectroscopy (MRS) creatine level, and enzyme activity or creatine uptake in fibroblasts ranging from PP4, PP4_Moderate and PP4_Strong. Our CCDS VCEP is one of the first panels applying disease specific variant classification algorithms for an X-linked disease. The availability of these guidelines and classifications can guide molecular genetics and genomic laboratories and health care providers to assess the molecular diagnosis of individuals with a CCDS phenotype.


Subject(s)
Amidinotransferases , Amidinotransferases/deficiency , Amino Acid Metabolism, Inborn Errors , Creatine , Creatine/deficiency , Guanidinoacetate N-Methyltransferase , Intellectual Disability , Language Development Disorders , Movement Disorders/congenital , Nerve Tissue Proteins , Plasma Membrane Neurotransmitter Transport Proteins , Plasma Membrane Neurotransmitter Transport Proteins/deficiency , Speech Disorders , Humans , Guanidinoacetate N-Methyltransferase/deficiency , Guanidinoacetate N-Methyltransferase/genetics , Creatine/metabolism , Plasma Membrane Neurotransmitter Transport Proteins/genetics , Amidinotransferases/genetics , Amidinotransferases/metabolism , Mental Retardation, X-Linked/genetics , Mental Retardation, X-Linked/diagnosis , Mutation , Brain Diseases, Metabolic, Inborn/genetics , Brain Diseases, Metabolic, Inborn/diagnosis , Phenotype , Data Curation , Developmental Disabilities
14.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38436563

ABSTRACT

The proliferation of single-cell RNA-seq data has greatly enhanced our ability to comprehend the intricate nature of diverse tissues. However, accurately annotating cell types in such data, especially when handling multiple reference datasets and identifying novel cell types, remains a significant challenge. To address these issues, we introduce Single Cell annotation based on Distance metric learning and Optimal Transport (scDOT), an innovative cell-type annotation method adept at integrating multiple reference datasets and uncovering previously unseen cell types. scDOT introduces two key innovations. First, by incorporating distance metric learning and optimal transport, it presents a novel optimization framework. This framework effectively learns the predictive power of each reference dataset for new query data and simultaneously establishes a probabilistic mapping between cells in the query data and reference-defined cell types. Secondly, scDOT develops an interpretable scoring system based on the acquired probabilistic mapping, enabling the precise identification of previously unseen cell types within the data. To rigorously assess scDOT's capabilities, we systematically evaluate its performance using two diverse collections of benchmark datasets encompassing various tissues, sequencing technologies and diverse cell types. Our experimental results consistently affirm the superior performance of scDOT in cell-type annotation and the identification of previously unseen cell types. These advancements provide researchers with a potent tool for precise cell-type annotation, ultimately enriching our understanding of complex biological tissues.


Subject(s)
Data Curation , Single-Cell Gene Expression Analysis , Humans , Benchmarking , Learning , Research Personnel
15.
Methods Mol Biol ; 2779: 369-394, 2024.
Article in English | MEDLINE | ID: mdl-38526795

ABSTRACT

Clinical studies are conducted to better understand the pathological mechanism of diseases and to find biomarkers associated with disease activity, drug response, or outcome prediction. Mass cytometry (MC) is a high-throughput single-cell technology that measures hundreds of cells per second with more than 40 markers per cell. Thus, it is a suitable tool for immune monitoring and biomarker discovery studies. Working in translational and clinical settings requires a careful experimental design to minimize, monitor, and correct the variations introduced during sample collection, preparation, acquisition, and analysis. In this review, we will focus on these important aspects of MC-related experiments and data curation in the context of translational clinical research projects.


Subject(s)
Data Curation , Research Design , Flow Cytometry , Biomarkers/analysis , Proteomics , Single-Cell Analysis
16.
Database (Oxford) ; 20242024 Mar 27.
Article in English | MEDLINE | ID: mdl-38537198

ABSTRACT

Curation of biomedical knowledge into systems biology diagrammatic or computational models is essential for studying complex biological processes. However, systems-level curation is a laborious manual process, especially when facing ever-increasing growth of domain literature. New findings demonstrating elaborate relationships between multiple molecules, pathways and cells have to be represented in a format suitable for systems biology applications. Importantly, curation should capture the complexity of molecular interactions in such a format together with annotations of the involved elements and support stable identifiers and versioning. This challenge calls for novel collaborative tools and platforms allowing to improve the quality and the output of the curation process. In particular, community-based curation, an important source of curated knowledge, requires support in role management, reviewing features and versioning. Here, we present Biological Knowledge Curation (BioKC), a web-based collaborative platform for the curation and annotation of biomedical knowledge following the standard data model from Systems Biology Markup Language (SBML). BioKC offers a graphical user interface for curation of complex molecular interactions and their annotation with stable identifiers and supporting sentences. With the support of collaborative curation and review, it allows to construct building blocks for systems biology diagrams and computational models. These building blocks can be published under stable identifiers and versioned and used as annotations, supporting knowledge building for modelling activities.


Subject(s)
Software , Systems Biology , Data Curation
20.
Appl Clin Inform ; 15(1): 111-118, 2024 01.
Article in English | MEDLINE | ID: mdl-38325408

ABSTRACT

BACKGROUND: Observational research has shown its potential to complement experimental research and clinical trials by secondary use of treatment data from hospital care processes. It can also be applied to better understand pediatric drug utilization for establishing safer drug therapy. Clinical documentation processes often limit data quality in pediatric medical records requiring data curation steps, which are mostly underestimated. OBJECTIVES: The objectives of this study were to transform and curate data from a departmental electronic medical record into an observational research database. We particularly aim at identifying data quality problems, illustrating reasons for such problems and describing the systematic data curation process established to create high-quality data for observational research. METHODS: Data were extracted from an electronic medical record used by four wards of a German university children's hospital from April 2012 to June 2020. A four-step data preparation, mapping, and curation process was established. Data quality of the generated dataset was firstly assessed following an established 3 × 3 Data Quality Assessment guideline and secondly by comparing a sample subset of the database with an existing gold standard. RESULTS: The generated dataset consists of 770,158 medication dispensations associated with 89,955 different drug exposures from 21,285 clinical encounters. A total of 6,840 different narrative drug therapy descriptions were mapped to 1,139 standard terms for drug exposures. Regarding the quality criterion correctness, the database was consistent and had overall a high agreement with our gold standard. CONCLUSION: Despite large amounts of freetext descriptions and contextual knowledge implicitly included in the electronic medical record, we were able to identify relevant data quality issues and to establish a semi-automated data curation process leading to a high-quality observational research database. Because of inconsistent dosage information in the original documentation this database is limited to a drug utilization database without detailed dosage information.


Subject(s)
Data Curation , Electronic Health Records , Humans , Child , Documentation , Databases, Factual , Data Accuracy
SELECTION OF CITATIONS
SEARCH DETAIL
...