Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
1.
J Med Internet Res ; 26: e56614, 2024 May 31.
Artigo em Inglês | MEDLINE | ID: mdl-38819879

RESUMO

BACKGROUND: Efficient data exchange and health care interoperability are impeded by medical records often being in nonstandardized or unstructured natural language format. Advanced language models, such as large language models (LLMs), may help overcome current challenges in information exchange. OBJECTIVE: This study aims to evaluate the capability of LLMs in transforming and transferring health care data to support interoperability. METHODS: Using data from the Medical Information Mart for Intensive Care III and UK Biobank, the study conducted 3 experiments. Experiment 1 assessed the accuracy of transforming structured laboratory results into unstructured format. Experiment 2 explored the conversion of diagnostic codes between the coding frameworks of the ICD-9-CM (International Classification of Diseases, Ninth Revision, Clinical Modification), and Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) using a traditional mapping table and a text-based approach facilitated by the LLM ChatGPT. Experiment 3 focused on extracting targeted information from unstructured records that included comprehensive clinical information (discharge notes). RESULTS: The text-based approach showed a high conversion accuracy in transforming laboratory results (experiment 1) and an enhanced consistency in diagnostic code conversion, particularly for frequently used diagnostic names, compared with the traditional mapping approach (experiment 2). In experiment 3, the LLM showed a positive predictive value of 87.2% in extracting generic drug names. CONCLUSIONS: This study highlighted the potential role of LLMs in significantly improving health care data interoperability, demonstrated by their high accuracy and efficiency in data transformation and exchange. The LLMs hold vast potential for enhancing medical data exchange without complex standardization for medical terms and data structure.


Assuntos
Troca de Informação em Saúde , Humanos , Troca de Informação em Saúde/normas , Interoperabilidade da Informação em Saúde , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Systematized Nomenclature of Medicine
2.
Genes Dis ; 11(3): 100979, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38299197

RESUMO

Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples. Metabolomics is emerging as a powerful tool generally for precision medicine. Particularly, integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease. However, metabolomics data are very complicated. Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis. In this review article, we comprehensively review various methods that are used to preprocess and pretreat metabolomics data, including MS-based data and NMR -based data preprocessing, dealing with zero and/or missing values and detecting outliers, data normalization, data centering and scaling, data transformation. We discuss the advantages and limitations of each method. The choice for a suitable preprocessing method is determined by the biological hypothesis, the characteristics of the data set, and the selected statistical data analysis method. We then provide the perspective of their applications in the microbiome and metabolome research.

3.
Stud Health Technol Inform ; 309: 133-134, 2023 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-37869823

RESUMO

Within the HORIZON 2020 project ORCHESTRA, patient data from numerous clinical studies in Europe related to COVID-19 were harmonized to create new knowledge on the disease. In this article, we describe the ecosystem that was established for the management of data collected and contributed by project partners. Study protocols elements were mapped to interoperability standards to establish a common terminology. That served as the basis of identifying common concepts used across several studies. Harmonized data were used to perform analysis directly on a central database and also through federated analysis when data was not permitted to leave the local server(s). This ecosystem facilitates the answering of research questions and generation of new knowledge available for the scientific community.


Assuntos
Gerenciamento de Dados , Humanos , Bases de Dados Factuais , Europa (Continente)
4.
Microb Ecol ; 86(4): 2790-2801, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37563275

RESUMO

High-throughput, multiplexed-amplicon sequencing has become a core tool for understanding environmental microbiomes. As researchers have widely adopted sequencing, many open-source analysis pipelines have been developed to compare microbiomes using compositional analysis frameworks. However, there is increasing evidence that compositional analyses do not provide the information necessary to accurately interpret many community assembly processes. This is especially true when there are large gradients that drive distinct community assembly processes. Recently, sequencing has been combined with Q-PCR (among other sources of total quantitation) to generate "Quantitative Sequencing" (QSeq) data. QSeq more accurately estimates the true abundance of taxa, is a more reliable basis for inferring correlation, and, ultimately, can be more reliably related to environmental data to infer community assembly processes. In this paper, we use a combination of published data sets, synthesis, and empirical modeling to offer guidance for which contexts QSeq is advantageous. As little as 5% variation in total abundance among experimental groups resulted in more accurate inference by QSeq than compositional methods. Compositional methods for differential abundance and correlation unreliably detected patterns in abundance and covariance when there was greater than 20% variation in total abundance among experimental groups. Whether QSeq performs better for beta diversity analysis depends on the question being asked, and the analytic strategy (e.g., what distance metric is being used); for many questions and methods, QSeq and compositional analysis are equivalent for beta diversity analysis. QSeq is especially useful for taxon-specific analysis; QSeq transformation and analysis should be the default for answering taxon-specific questions of amplicon sequence data. Publicly available bioinformatics pipelines should incorporate support for QSeq transformation and analysis.


Assuntos
Bactérias , Microbiota , Bactérias/genética , Densidade Demográfica , Microbiota/genética , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala/métodos
5.
Metabolites ; 13(7)2023 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-37512549

RESUMO

In recent years, the FAIR guiding principles and the broader concept of open science has grown in importance in academic research, especially as funding entities have aggressively promoted public sharing of research products. Key to public research sharing is deposition of datasets into online data repositories, but it can be a chore to transform messy unstructured data into the forms required by these repositories. To help generate Metabolomics Workbench depositions, we have developed the MESSES (Metadata from Experimental SpreadSheets Extraction System) software package, implemented in the Python 3 programming language and supported on Linux, Windows, and Mac operating systems. MESSES helps transform tabular data from multiple sources into a Metabolomics Workbench specific deposition format. The package provides three commands, extract, validate, and convert, that implement a natural data transformation workflow. Moreover, MESSES facilitates richer metadata capture than is typically attempted by manual efforts. The source code and extensive documentation is hosted on GitHub and is also available on the Python Package Index for easy installation.

6.
J Med Internet Res ; 25: e45651, 2023 07 17.
Artigo em Inglês | MEDLINE | ID: mdl-37459170

RESUMO

BACKGROUND: Reference intervals (RIs) play an important role in clinical decision-making. However, due to the time, labor, and financial costs involved in establishing RIs using direct means, the use of indirect methods, based on big data previously obtained from clinical laboratories, is getting increasing attention. Different indirect techniques combined with different data transformation methods and outlier removal might cause differences in the calculation of RIs. However, there are few systematic evaluations of this. OBJECTIVE: This study used data derived from direct methods as reference standards and evaluated the accuracy of combinations of different data transformation, outlier removal, and indirect techniques in establishing complete blood count (CBC) RIs for large-scale data. METHODS: The CBC data of populations aged ≥18 years undergoing physical examination from January 2010 to December 2011 were retrieved from the First Affiliated Hospital of China Medical University in northern China. After exclusion of repeated individuals, we performed parametric, nonparametric, Hoffmann, Bhattacharya, and truncation points and Kolmogorov-Smirnov distance (kosmic) indirect methods, combined with log or BoxCox transformation, and Reed-Dixon, Tukey, and iterative mean (3SD) outlier removal methods in order to derive the RIs of 8 CBC parameters and compared the results with those directly and previously established. Furthermore, bias ratios (BRs) were calculated to assess which combination of indirect technique, data transformation pattern, and outlier removal method is preferrable. RESULTS: Raw data showed that the degrees of skewness of the white blood cell (WBC) count, platelet (PLT) count, mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), and mean corpuscular volume (MCV) were much more obvious than those of other CBC parameters. After log or BoxCox transformation combined with Tukey or iterative mean (3SD) processing, the distribution types of these data were close to Gaussian distribution. Tukey-based outlier removal yielded the maximum number of outliers. The lower-limit bias of WBC (male), PLT (male), hemoglobin (HGB; male), MCH (male/female), and MCV (female) was greater than that of the corresponding upper limit for more than half of 30 indirect methods. Computational indirect choices of CBC parameters for males and females were inconsistent. The RIs of MCHC established by the direct method for females were narrow. For this, the kosmic method was markedly superior, which contrasted with the RI calculation of CBC parameters with high |BR| qualification rates for males. Among the top 10 methodologies for the WBC count, PLT count, HGB, MCV, and MCHC with a high-BR qualification rate among males, the Bhattacharya, Hoffmann, and parametric methods were superior to the other 2 indirect methods. CONCLUSIONS: Compared to results derived by the direct method, outlier removal methods and indirect techniques markedly influence the final RIs, whereas data transformation has negligible effects, except for obviously skewed data. Specifically, the outlier removal efficiency of Tukey and iterative mean (3SD) methods is almost equivalent. Furthermore, the choice of indirect techniques depends more on the characteristics of the studied analyte itself. This study provides scientific evidence for clinical laboratories to use their previous data sets to establish RIs.


Assuntos
Big Data , Contagem de Células Sanguíneas , Adolescente , Adulto , Feminino , Humanos , Masculino , China , Contagem de Leucócitos , Valores de Referência , Tomada de Decisão Clínica
7.
Phys Med Biol ; 68(12)2023 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-37192630

RESUMO

Objective. Denoising models based on the supervised learning have been proposed for medical imaging. However, its clinical availability in digital tomosynthesis (DT) imaging is limited due to the necessity of a large amount of training data for providing acceptable image quality and the difficulty in minimizing a loss. Reinforcement learning (RL) can provide the optimal pollicy, which maximizes a reward, with a small amount of training data for implementing a task. In this study, we presented a denoising model based on the multi-agent RL for DT imaging in order to improve the performance of the machine learning-based denoising model.Approach. The proposed multi-agent RL network consisted of shared sub-network, value sub-network with a reward map convolution (RMC) technique and policy sub-network with a convolutional gated recurrent unit (convGRU). Each sub-network was designed for implementing feature extraction, reward calculation and action execution, respectively. The agents of the proposed network were assigned to each image pixel. The wavelet and Anscombe transformations were applied to DT images for delivering precise noise features during network training. The network training was implemented with the DT images obtained from the three-dimensional digital chest phantoms, which were constructed by using clinical CT images. The performance of the proposed denoising model was evaluated in terms of signal-to-noise ratio (SNR), structural similarity (SSIM) and peak signal-to-noise ratio (PSNR).Main results. Comparing the supervised learning, the proposed denoising model improved the SNRs of the output DT images by 20.64% while maintaining the similar SSIMs and PSNRs. In addition, the SNRs of the output DT images with the wavelet and Anscombe transformations were 25.88 and 42.95% higher than that for the supervised learning, respectively.Significance. The denoising model based on the multi-agent RL can provide high-quality DT images, and the proposed method enables the performance improvement of machine learning-based denoising models.


Assuntos
Imageamento por Ressonância Magnética , Tomografia Computadorizada por Raios X , Radiografia , Imageamento por Ressonância Magnética/métodos , Imagens de Fantasmas , Razão Sinal-Ruído , Processamento de Imagem Assistida por Computador/métodos , Algoritmos
8.
Stud Health Technol Inform ; 302: 43-47, 2023 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-37203606

RESUMO

FHIR is a widely accepted interoperability standard for exchanging medical data, but data transformation from the primary health information systems into FHIR is usually challenging and requires advanced technical skills and infrastructure. There is a critical need for low-cost solutions, and using Mirth Connect as an open-source tool provides this opportunity. We developed a reference implementation to transform data from CSV (the most common data format) into FHIR resources using Mirth Connect without any advanced technical resources or programming skills. This reference implementation is tested successfully for both quality and performance, and it enables reproducing and improving the implemented approach by healthcare providers to transform raw data into FHIR resources. For ensuring replicability, the used channel, mapping, and templates are available publicly on GitHub (https://github.com/alkarkoukly/CSV-FHIR-Transformer).


Assuntos
Sistemas de Informação em Saúde , Software , Registros Eletrônicos de Saúde , Nível Sete de Saúde
9.
J Med Imaging (Bellingham) ; 10(6): 061103, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37125408

RESUMO

Purpose: Although there are several options for improving the generalizability of learned models, a data instance-based approach is desirable when stable data acquisition conditions cannot be guaranteed. Despite the wide use of data transformation methods to reduce data discrepancies between different data domains, detailed analysis for explaining the performance of data transformation methods is lacking. Approach: This study compares several data transformation methods in the tuberculosis detection task with multi-institutional chest x-ray (CXR) data. Five different data transformations, including normalization, standardization with and without lung masking, and multi-frequency-based (MFB) standardization with and without lung masking were implemented. A tuberculosis detection network was trained using a reference dataset, and the data from six other sites were used for the network performance comparison. To analyze data harmonization performance, we extracted radiomic features and calculated the Mahalanobis distance. We visualized the features with a dimensionality reduction technique. Through similar methods, deep features of the trained networks were also analyzed to examine the models' responses to the data from various sites. Results: From various numerical assessments, the MFB standardization with lung masking provided the highest network performance for the non-reference datasets. From the radiomic and deep feature analyses, the features of the multi-site CXRs after MFB with lung masking were found to be well homogenized to the reference data, whereas the others showed limited performance. Conclusions: Conventional normalization and standardization showed suboptimal performance in minimizing feature differences among various sites. Our study emphasizes the strengths of MFB standardization with lung masking in terms of network performance and feature homogenization.

10.
J Med Internet Res ; 25: e42822, 2023 03 08.
Artigo em Inglês | MEDLINE | ID: mdl-36884270

RESUMO

BACKGROUND: Sharing health data is challenging because of several technical, ethical, and regulatory issues. The Findable, Accessible, Interoperable, and Reusable (FAIR) guiding principles have been conceptualized to enable data interoperability. Many studies provide implementation guidelines, assessment metrics, and software to achieve FAIR-compliant data, especially for health data sets. Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) is a health data content modeling and exchange standard. OBJECTIVE: Our goal was to devise a new methodology to extract, transform, and load existing health data sets into HL7 FHIR repositories in line with FAIR principles, develop a Data Curation Tool to implement the methodology, and evaluate it on health data sets from 2 different but complementary institutions. We aimed to increase the level of compliance with FAIR principles of existing health data sets through standardization and facilitate health data sharing by eliminating the associated technical barriers. METHODS: Our approach automatically processes the capabilities of a given FHIR end point and directs the user while configuring mappings according to the rules enforced by FHIR profile definitions. Code system mappings can be configured for terminology translations through automatic use of FHIR resources. The validity of the created FHIR resources can be automatically checked, and the software does not allow invalid resources to be persisted. At each stage of our data transformation methodology, we used particular FHIR-based techniques so that the resulting data set could be evaluated as FAIR. We performed a data-centric evaluation of our methodology on health data sets from 2 different institutions. RESULTS: Through an intuitive graphical user interface, users are prompted to configure the mappings into FHIR resource types with respect to the restrictions of selected profiles. Once the mappings are developed, our approach can syntactically and semantically transform existing health data sets into HL7 FHIR without loss of data utility according to our privacy-concerned criteria. In addition to the mapped resource types, behind the scenes, we create additional FHIR resources to satisfy several FAIR criteria. According to the data maturity indicators and evaluation methods of the FAIR Data Maturity Model, we achieved the maximum level (level 5) for being Findable, Accessible, and Interoperable and level 3 for being Reusable. CONCLUSIONS: We developed and extensively evaluated our data transformation approach to unlock the value of existing health data residing in disparate data silos to make them available for sharing according to the FAIR principles. We showed that our method can successfully transform existing health data sets into HL7 FHIR without loss of data utility, and the result is FAIR in terms of the FAIR Data Maturity Model. We support institutional migration to HL7 FHIR, which not only leads to FAIR data sharing but also eases the integration with different research networks.


Assuntos
Registros Eletrônicos de Saúde , Software , Humanos , Design de Software , Nível Sete de Saúde , Disseminação de Informação
11.
J Hazard Mater Adv ; 9: 100220, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36818682

RESUMO

Despite the requirement for data to be normally distributed with variance being independent of the mean, some studies of plastic litter, including COVID-19 face masks, have not tested for these assumptions before embarking on analyses using parametric statistics. Investigation of new data and secondary analyses of published literature data indicate that face masks are not normally distributed and that variances are not independent of mean densities. In consequence, it is necessary to either use nonparametric analyses or to transform data prior to undertaking parametric approaches. For the new data set, spatial and temporal variance functions indicate that according to Taylor's Power Law, the fourth-root transformation will offer most promise for stabilizing variance about the mean.

12.
Methods Mol Biol ; 2638: 173-189, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36781642

RESUMO

KASP is commonly used to genotype bi-allelic SNPs and In/Dels, and the standard protocol works well when both alleles are nearly equally prevalent in the DNA template. To detect rare alleles in bulked samples or to distinguish more than three genotypes, such as tri-allelic loci or mutations across orthologous genes in polyploids, adjustments to the protocol and/or data analysis are required. In this chapter, we present modified protocols for these non-traditional applications, including reaction conditions that enhance the fluorophore signal from rare alleles, resulting in increased KASP assay sensitivity. We also describe alternative KASP data analysis approaches that increase statistical certainty of genotyping calls. Furthermore, this increased assay sensitivity enables high-throughput genotyping using KASP, as samples can be pooled and tested in a single reaction. For example, rare alleles can be detected in mixed seed pools when present in ratios as low as 1 in 200. The assay modifications presented here expand the options available for complex genotyping, and retain KASP's advantages of being cheap, fast, and accurate.


Assuntos
Técnicas de Amplificação de Ácido Nucleico , Poliploidia , Humanos , Genótipo , Alelos , Reação em Cadeia da Polimerase/métodos , Polimorfismo de Nucleotídeo Único
13.
Spectrochim Acta A Mol Biomol Spectrosc ; 290: 122311, 2023 Apr 05.
Artigo em Inglês | MEDLINE | ID: mdl-36608516

RESUMO

In this study, reflectance spectroscopy was used to achieve rapid and non-destructive detection of amylase activity and moisture content in rice. Since rice husk can interfere with spectral measurements, spectral data transformation was used to remove the husk interference. Reflectance spectra of rice were transformed by direct standardization, convolutional autoencoder network, and kernel regression (KR). Then, random frog and elliptical envelope were adopted to select effective wavelengths, and partial least squares regression (PLSR) and support vector regression were used to establish analysis models. The optimal transformation was from KR, and PLSR and effective wavelengths of the transformed spectra obtained excellent performance with coefficient of determination of test of 0.6987 and 0.8317 and root-mean-square error of test of 0.3359 and 2.2239, respectively. The result was better than that of the rice spectra and was close to that of the husked rice spectra. When the moisture content was integrated into the regression model of amylase activity, a better result was obtained. Thus, the proposed method can detect amylase activity and moisture content in rice accurately.


Assuntos
Oryza , Oryza/química , Espectroscopia de Luz Próxima ao Infravermelho/métodos , Análise dos Mínimos Quadrados , Amilases
14.
Front Psychol ; 13: 976724, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36483722

RESUMO

Integrated science, technology, engineering, and mathematics (STEM) embedding project-based learning (i-STEM PjBL) is still faced with challenges, and its educational values have not been revealed, which is what the study aimed to explore. Participants consisted of 48 freshmen from a senior high school, including 27 male students and 21 female students. The open-ended questionnaire and the interview for the students were administrated after the i-STEM PjBL. The qualitative data were converted into quantitative data counted by the occurrence frequencies of the codes. The results based on the integration and comparison of the open-ended questionnaire and interview outcomes showed that i-STEM PjBL provided students with positive educational values (including learning acquisition, performance, and perception), but there were also learning challenges in the process. Learning acquisition focused on the basic structure and components of a robot, principles of robot motion, hull structure, principles of sailboat navigation, and skills of designing and assembling sailboats. Learning performance referred that students were satisfied with their hands-on performances and confident of their abilities to perform better in similar disciplines, but did not learn well on programming. Learning perception indicated that students felt interested in i-STEM PjBL materials could acquire knowledge and skills from various fields, PjBL could be helpful to complete works, and principles could be helpful in practice, while programming design learning materials were not enough. Learning challenges indicated that students were unfamiliar with the usage of tools and hands-on operation, and they also felt challenged by programming. Students' feedback can be taken as references to modify and improve i-STEM PjBL and the materials in the future.

15.
Artigo em Inglês | MEDLINE | ID: mdl-36497545

RESUMO

Mapping spatial distribution of soil contaminants at contaminated sites is the basis of risk assessment. Hotspots can cause strongly skewed distribution of the raw contaminant concentrations in soil, and consequently can require suitable normalization prior to interpolation. In this study, three normalization methods including normal score, Johnson, and Box-Cox transformation were performed on the concentrations of two low-molecular weight (LMW) PAHs (i.e., acenaphthene (Ace) and naphthalene (Nap)) and two high-molecular weight (HMW) PAHs (i.e., benzo(a)pyrene (BaP) and benzo(b)fluoranthene (BbF)) in soils of a typical coking plant in North China. The estimating accuracy of soil LMW and HMW PAHs distribution using ordinary kriging with different normalization methods was compared. The results showed that all transformed data passed the Kolmogorov-Smirnov test, indicating that all three data transformation methods achieved normality of raw data. Compared to Box-Cox-ordinary kriging, normal score-, and Johnson-ordinary kriging had higher estimating accuracy of the four soil PAHs distribution. In cross-validation, smaller root-mean-square error (RMSE) and mean error (ME) values were observed for normal score-ordinary kriging for both LMW and HMW PAHs compared to Johnson- and Box-Cox-ordinary kriging. Thus, normal score transformation is suitable for alleviating the impact of hotspots on estimating accuracy of the four selected soil PAHs distribution at this coking plant. The findings can provide insights into reducing uncertainty in spatial interpolation at PAHs-contaminated sites.


Assuntos
Coque , Hidrocarbonetos Policíclicos Aromáticos , Poluentes do Solo , Solo , Poluentes do Solo/análise , Peso Molecular , Monitoramento Ambiental/métodos , Hidrocarbonetos Policíclicos Aromáticos/análise , Plantas , China
16.
Comput Struct Biotechnol J ; 20: 6149-6162, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36420153

RESUMO

The etiology of neuropsychiatric disorders involves complex biological processes at different omics layers, such as genomics, transcriptomics, epigenetics, proteomics, and metabolomics. The advent of high-throughput technology, as well as the availability of large open-source datasets, has ushered in a new era in system biology, necessitating the integration of various types of omics data. The complexity of biological mechanisms, the limitations of integrative strategies, and the heterogeneity of multi-omics data have all presented significant challenges to computational scientists. In comparison to early and late integration, intermediate integration may transform each data type into appropriate intermediate representations using various data transformation techniques, allowing it to capture more complementary information contained in each omics and highlight new interactions across omics layers. Here, we reviewed multi-modal intermediate integrative techniques based on component analysis, matrix factorization, similarity network, multiple kernel learning, Bayesian network, artificial neural networks, and graph transformation, as well as their applications in neuropsychiatric domains. We depicted advancements in these approaches and compared the strengths and weaknesses of each method examined. We believe that our findings will aid researchers in their understanding of the transformation and integration of multi-omics data in neuropsychiatric disorders.

17.
Int J Mol Sci ; 23(18)2022 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-36142316

RESUMO

The number of patients diagnosed with cancer continues to increasingly rise, and has nearly doubled in 20 years. Therefore, predicting cancer occurrence has a significant impact on reducing medical costs, and preventing cancer early can increase survival rates. In the data preprocessing step, since individual genome data are used as input data, they are classified as individual genome data. Subsequently, data embedding is performed in character units, so that it can be used in deep learning. In the deep learning network schema, using preprocessed data, a character-based deep learning network learns the correlation between individual feature data and predicts cancer occurrence. To evaluate the objective reliability of the method proposed in this study, various networks published in other studies were compared and evaluated using the TCGA dataset. As a result of comparing various networks published in other studies using the same data, excellent results were obtained in terms of accuracy, sensitivity, and specificity. Thus, the superiority of the effectiveness of deep learning networks in predicting cancer occurrence using individual whole-genome data was demonstrated. From the results of the confusion matrix, the validity of the model for predicting the cancer using an individual's whole-genome data and the deep learning proposed in this study was proven. In addition, the AUC, which is the area under the ROC curve, which judges the efficiency of diagnosis as a performance evaluation index of the model, was found to be 90% or more, good classification results were derived. The objectives of this study were to use individual genome data for 12 cancers as input data to analyze the whole genome pattern, and to not separately use reference genome sequence data of normal individuals. In addition, several mutation types, including SNV, DEL, and INS, were applied.


Assuntos
Aprendizado Profundo , Neoplasias , Humanos , Neoplasias/genética , Curva ROC , Reprodutibilidade dos Testes
18.
J Biomed Semantics ; 13(1): 9, 2022 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-35292119

RESUMO

BACKGROUND: The European Platform on Rare Disease Registration (EU RD Platform) aims to address the fragmentation of European rare disease (RD) patient data, scattered among hundreds of independent and non-coordinating registries, by establishing standards for integration and interoperability. The first practical output of this effort was a set of 16 Common Data Elements (CDEs) that should be implemented by all RD registries. Interoperability, however, requires decisions beyond data elements - including data models, formats, and semantics. Within the European Joint Programme on Rare Diseases (EJP RD), we aim to further the goals of the EU RD Platform by generating reusable RD semantic model templates that follow the FAIR Data Principles. RESULTS: Through a team-based iterative approach, we created semantically grounded models to represent each of the CDEs, using the SemanticScience Integrated Ontology as the core framework for representing the entities and their relationships. Within that framework, we mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Orphanet Rare Disease Ontology, Human Phenotype Ontology and National Cancer Institute Thesaurus. Finally, we created an exemplar, reusable ETL pipeline that we will be deploying over these non-coordinating data repositories to assist them in creating model-compliant FAIR data without requiring site-specific coding nor expertise in Linked Data or FAIR. CONCLUSIONS: Within the EJP RD project, we determined that creating reusable, expert-designed templates reduced or eliminated the requirement for our participating biomedical domain experts and rare disease data hosts to understand OWL semantics. This enabled them to publish highly expressive FAIR data using tools and approaches that were already familiar to them.


Assuntos
Elementos de Dados Comuns , Doenças Raras , Humanos , Sistema de Registros , Semântica , Fluxo de Trabalho
19.
JMIR Med Inform ; 10(2): e34932, 2022 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-35142637

RESUMO

BACKGROUND: Health care data are fragmenting as patients seek care from diverse sources. Consequently, patient care is negatively impacted by disparate health records. Machine learning (ML) offers a disruptive force in its ability to inform and improve patient care and outcomes. However, the differences that exist in each individual's health records, combined with the lack of health data standards, in addition to systemic issues that render the data unreliable and that fail to create a single view of each patient, create challenges for ML. Although these problems exist throughout health care, they are especially prevalent within maternal health and exacerbate the maternal morbidity and mortality crisis in the United States. OBJECTIVE: This study aims to demonstrate that patient records extracted from the electronic health records (EHRs) of a large tertiary health care system can be made actionable for the goal of effectively using ML to identify maternal cardiovascular risk before evidence of diagnosis or intervention within the patient's record. Maternal patient records were extracted from the EHRs of a large tertiary health care system and made into patient-specific, complete data sets through a systematic method. METHODS: We outline the effort that was required to define the specifications of the computational systems, the data set, and access to relevant systems, while ensuring that data security, privacy laws, and policies were met. Data acquisition included the concatenation, anonymization, and normalization of health data across multiple EHRs in preparation for their use by a proprietary risk stratification algorithm designed to establish patient-specific baselines to identify and establish cardiovascular risk based on deviations from the patient's baselines to inform early interventions. RESULTS: Patient records can be made actionable for the goal of effectively using ML, specifically to identify cardiovascular risk in pregnant patients. CONCLUSIONS: Upon acquiring data, including their concatenation, anonymization, and normalization across multiple EHRs, the use of an ML-based tool can provide early identification of cardiovascular risk in pregnant patients.

20.
J Gen Intern Med ; 37(2): 308-317, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34505983

RESUMO

BACKGROUND: Meta-analysis is increasingly used to synthesize proportions (e.g., disease prevalence). It can be implemented with widely used two-step methods or one-step methods, such as generalized linear mixed models (GLMMs). Existing simulation studies have shown that GLMMs outperform the two-step methods in some settings. It is, however, unclear whether these simulation settings are common in the real world. We aim to compare the real-world performance of various meta-analysis methods for synthesizing proportions. METHODS: We extracted datasets of proportions from the Cochrane Library and applied 12 two-step and one-step methods to each dataset. We used Spearman's ρ and the Bland-Altman plot to assess their results' correlation and agreement. The GLMM with the logit link was chosen as the reference method. We calculated the absolute difference and fold change (ratio of estimates) of the overall proportion estimates produced by each method vs. the reference method. RESULTS: We obtained a total of 43,644 datasets. The various methods generally had high correlations (ρ > 0.9) and agreements. GLMMs had computational issues more frequently than the two-step methods. However, the two-step methods generally produced large absolute differences from the GLMM with the logit link for small total sample sizes (< 50) and crude event rates within 10-20% and 90-95%, and large fold changes for small total event counts (< 10) and low crude event rates (< 20%). CONCLUSIONS: Although different methods produced similar overall proportion estimates in most datasets, one-step methods should be considered in the presence of small total event counts or sample sizes and very low or high event rates.


Assuntos
Projetos de Pesquisa , Simulação por Computador , Humanos , Modelos Lineares , Tamanho da Amostra
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...