Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Zhongguo Yi Liao Qi Xie Za Zhi ; 48(4): 373-379, 2024 Jul 30.
Artigo em Chinês | MEDLINE | ID: mdl-39155248

RESUMO

Sleep disordered breathing (SDB) is a common sleep disorder with an increasing prevalence. The current gold standard for diagnosing SDB is polysomnography (PSG), but existing PSG techniques have some limitations, such as long manual interpretation times, a lack of data quality control, and insufficient monitoring of gas metabolism and hemodynamics. Therefore, there is an urgent need in China's sleep clinical applications to develop a new intelligent PSG system with data quality control, gas metabolism assessment, and hemodynamic monitoring capabilities. The new system, in terms of hardware, detects traditional parameters like nasal airflow, blood oxygen levels, electrocardiography (ECG), electroencephalography (EEG), electromyography (EMG), electrooculogram (EOG), and includes additional modules for gas metabolism assessment via end-tidal CO 2 and O 2 concentration, and hemodynamic function assessment through impedance cardiography. On the software side, deep learning methods are being employed to develop intelligent data quality control and diagnostic techniques. The goal is to provide detailed sleep quality assessments that effectively assist doctors in evaluating the sleep quality of SDB patients.


Assuntos
Eletrocardiografia , Eletroencefalografia , Polissonografia , Humanos , Síndromes da Apneia do Sono/diagnóstico , Eletromiografia , Eletroculografia , Sono , Software , Hemodinâmica
2.
Stud Health Technol Inform ; 316: 1328-1332, 2024 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-39176627

RESUMO

This paper explores the challenges and lessons learned during the mapping of HL7 v2 messages structured using custom schema to openEHR for the Medical Data Integration Center (MeDIC) of the University Hospital, Schleswig-Holstein (UKSH). Missing timestamps in observations, missing units of measurement, inconsistencies in decimal separators and unexpected datatypes were identified as critical inconsistencies in this process. These anomalies highlight the difficulty of automating the transformation of HL7 v2 data to any standard, particularly openEHR, using off-the-shelf tools. Addressing these anomalies is crucial for enhancing data interoperability, supporting evidence-based research, and optimizing clinical decision-making. Implementing proper data quality measures and governance will unlock the potential of integrated clinical data, empowering clinicians and researchers and fostering a robust healthcare ecosystem.


Assuntos
Nível Sete de Saúde , Registros Eletrônicos de Saúde , Interoperabilidade da Informação em Saúde , Alemanha , Integração de Sistemas , Humanos , Registro Médico Coordenado/métodos
3.
Artigo em Inglês | MEDLINE | ID: mdl-39013167

RESUMO

Mass spectrometry is broadly employed to study complex molecular mechanisms in various biological and environmental fields, enabling 'omics' research such as proteomics, metabolomics, and lipidomics. As study cohorts grow larger and more complex with dozens to hundreds of samples, the need for robust quality control (QC) measures through automated software tools becomes paramount to ensure the integrity, high quality, and validity of scientific conclusions from downstream analyses and minimize the waste of resources. Since existing QC tools are mostly dedicated to proteomics, automated solutions supporting metabolomics are needed. To address this need, we developed the software PeakQC, a tool for automated QC of MS data that is independent of omics molecular types (i.e., omics-agnostic). It allows automated extraction and inspection of peak metrics of precursor ions (e.g., errors in mass, retention time, arrival time) and supports various instrumentations and acquisition types, from infusion experiments or using liquid chromatography and/or ion mobility spectrometry front-end separations and with/without fragmentation spectra from data-dependent or independent acquisition analyses. Diagnostic plots for fragmentation spectra are also generated. Here, we describe and illustrate PeakQC's functionalities using different representative data sets, demonstrating its utility as a valuable tool for enhancing the quality and reliability of omics mass spectrometry analyses.

4.
Sensors (Basel) ; 24(4)2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38400233

RESUMO

The unconsolidated near surface and large, daily temperature variations in the desert environment degrade the vertical seismic profiling (VSP) data, posing the need for rigorous quality control. Distributed acoustic sensing (DAS) VSP data are often benchmarked using geophone surveys as a gold standard. This study showcases a new simulation-based way to assess the quality of DAS VSP acquired in the desert without geophone data. The depth uncertainty of the DAS channels in the wellbore is assessed by calibrating against formation depth based on the concept of conservation of the energy flux. Using the 1D velocity model derived from checkshot data, we simulate both DAS and geophone VSP data via an elastic pseudo-spectral finite difference method, and estimate the source and receiver signatures using matching filters. These field geophone data show high amplitude variations between channels that cannot be replicated in the simulation. In contrast, the DAS simulation shows a high visual similarity with the field DAS first arrival waveforms. The simulated source and receiver signatures are visually indistinguishable from the field DAS data in this study. Since under perfect conditions, the receiver signatures should be invariant with depth, we propose a new DAS data quality control metric based on local variations of the receiver signatures which does not require geophone measurements.

5.
Environ Sci Technol ; 57(46): 18058-18066, 2023 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-37582237

RESUMO

Machine learning (ML) techniques promise to revolutionize environmental research and management, but collecting the necessary volumes of high-quality data remains challenging. Environmental sensors are often deployed under harsh conditions, requiring labor-intensive quality assurance and control (QAQC) processes. The need for manual QAQC is a major impediment to the scalability of these sensor networks. Existing techniques for automated QAQC make strong assumptions about noise profiles in the data they filter that do not necessarily hold for broadly deployed environmental sensors, however. Toward the goal of increasing the volume of high-quality environmental data, we introduce an ML-assisted QAQC methodology that is robust to low signal-to-noise ratio data. Our approach embeds sensor measurements into a dynamical feature space and trains a binary classification algorithm (Support Vector Machine) to detect deviation from expected process dynamics, indicating whether a sensor has become compromised and requires maintenance. This strategy enables the automated detection of a wide variety of nonphysical signals. We apply the methodology to three novel data sets produced by 136 low-cost environmental sensors (stream level, drinking water pH, and drinking water electroconductivity), deployed by our group across 250,000 km2 in Michigan, USA. The proposed methodology achieved accuracy scores of up to 0.97 and consistently outperformed state-of-the-art anomaly detection techniques.


Assuntos
Água Potável , Aprendizado de Máquina , Algoritmos , Michigan
6.
Ther Innov Regul Sci ; 57(6): 1217-1228, 2023 11.
Artigo em Inglês | MEDLINE | ID: mdl-37450198

RESUMO

Monitoring of clinical trials is a fundamental process required by regulatory agencies. It assures the compliance of a center to the required regulations and the trial protocol. Traditionally, monitoring teams relied on extensive on-site visits and source data verification. However, this is costly, and the outcome is limited. Thus, central statistical monitoring (CSM) is an additional approach recently embraced by the International Council for Harmonisation (ICH) to detect problematic or erroneous data by using visualizations and statistical control measures. Existing implementations have been primarily focused on detecting inlier and outlier data. Other approaches include principal component analysis and distribution of the data. Here we focus on the utilization of comparisons of centers to the Grand mean for different model types and assumptions for common data types, such as binomial, ordinal, and continuous response variables. We implement the usage of multiple comparisons of single centers to the Grand mean of all centers. This approach is also available for various non-normal data types that are abundant in clinical trials. Further, using confidence intervals, an assessment of equivalence to the Grand mean can be applied. In a Monte Carlo simulation study, the applied statistical approaches have been investigated for their ability to control type I error and the assessment of their respective power for balanced and unbalanced designs which are common in registry data and clinical trials. Data from the German Multiple Sclerosis Registry (GMSR) including proportions of missing data, adverse events and disease severity scores were used to verify the results on Real-World-Data (RWD).


Assuntos
Esclerose Múltipla , Humanos , Esclerose Múltipla/tratamento farmacológico , Simulação por Computador
7.
BMC Bioinformatics ; 24(1): 77, 2023 Mar 03.
Artigo em Inglês | MEDLINE | ID: mdl-36869285

RESUMO

BACKGROUND: Data archiving and distribution are essential to scientific rigor and reproducibility of research. The National Center for Biotechnology Information's Database of Genotypes and Phenotypes (dbGaP) is a public repository for scientific data sharing. To support curation of thousands of complex data sets, dbGaP has detailed submission instructions that investigators must follow when archiving their data. RESULTS: We developed dbGaPCheckup, an R package which implements a series of check, awareness, reporting, and utility functions to support data integrity and proper formatting of the subject phenotype data set and data dictionary prior to dbGaP submission. For example, as a tool, dbGaPCheckup ensures that the data dictionary contains all fields required by dbGaP, and additional fields required by dbGaPCheckup; the number and names of variables match between the data set and data dictionary; there are no duplicated variable names or descriptions; observed data values are not more extreme than the logical minimum and maximum values stated in the data dictionary; and more. The package also includes functions that implement a series of minor/scalable fixes when errors are detected (e.g., a function to reorder the variables in the data dictionary to match the order listed in the data set). Finally, we also include reporting functions that produce graphical and textual descriptives of the data to further reduce the likelihood of data integrity issues. The dbGaPCheckup R package is available on CRAN ( https://CRAN.R-project.org/package=dbGaPCheckup ) and developed on GitHub ( https://github.com/lwheinsberg/dbGaPCheckup ). CONCLUSION: dbGaPCheckup is an innovative assistive and timesaving tool that fills an important gap for researchers by making dbGaP submission of large and complex data sets less error prone.


Assuntos
Biotecnologia , Disseminação de Informação , Reprodutibilidade dos Testes , Bases de Dados Factuais , Fenótipo
8.
Front Plant Sci ; 14: 1077196, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36760650

RESUMO

Variety testing is an indispensable and essential step in the process of creating new improved varieties from breeding to adoption. The performance of the varieties can be compared and evaluated based on multi-trait data from multi-location variety tests in multiple years. Although high-throughput phenotypic platforms have been used for observing some specific traits, manual phenotyping is still widely used. The efficient management of large amounts of data is still a significant problem for crop variety testing. This study reports a variety test platform (VTP) that was created to manage the whole workflow for the standardization and data quality improvement of crop variety testing. Through the VTP, the phenotype data of varieties can be integrated and reused based on standardized data elements and datasets. Moreover, the information support and automated functions for the whole testing workflow help users conduct tests efficiently through a series of functions such as test design, data acquisition and processing, and statistical analyses. The VTP has been applied to regional variety tests covering more than seven thousand locations across the whole country, and then a standardized and authoritative phenotypic database covering five crops has been generated. In addition, the VTP can be deployed on either privately or publicly available high-performance computing nodes so that test management and data analysis can be conveniently done using a web-based interface or mobile application. In this way, the system can provide variety test management services to more small and medium-sized breeding organizations, and ensures the mutual independence and security of test data. The application of VTP shows that the platform can make variety testing more efficient and can be used to generate a reliable database suitable for meta-analysis in multi-omics breeding and variety development projects.

9.
Methods Mol Biol ; 2426: 267-302, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36308693

RESUMO

Protein post-translational modifications (PTMs) are essential elements of cellular communication. Their variations in abundance can affect cellular pathways, leading to cellular disorders and diseases. A widely used method for revealing PTM-mediated regulatory networks is their label-free quantitation (LFQ) by high-resolution mass spectrometry. The raw data resulting from such experiments are generally interpreted using specific software, such as MaxQuant, MassChroQ, or Proline for instance. They provide data matrices containing quantified intensities for each modified peptide identified. Statistical analyses are then necessary (1) to ensure that the quantified data are of good enough quality and sufficiently reproducible, (2) to highlight the modified peptides that are differentially abundant between the biological conditions under study. The objective of this chapter is therefore to provide a complete data analysis pipeline for analyzing the quantified values of modified peptides in presence of two or more biological conditions using the R software. We illustrate our pipeline starting from MaxQuant outputs dealing with the analysis of A549-ACE2 cells infected by SARS-CoV-2 at different time stamps, freely available on PRIDE (PXD020019).


Assuntos
COVID-19 , Proteômica , Humanos , Proteômica/métodos , SARS-CoV-2 , Processamento de Proteína Pós-Traducional , Software , Peptídeos/metabolismo
10.
Mar Pollut Bull ; 185(Pt A): 114181, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36308819

RESUMO

Assessing the status of marine pollution at regional and sub-regional scales requires the use of comparable and harmonized data provided by multiple institutions, located in several countries. Standardized data management and quality control are crucial for supporting a coherent evaluation of marine pollution. Taking the Eastern Mediterranean Sea as a case study, we propose an approach to improve the quality control procedures used for sediment pollution data, thus supporting a harmonized environmental assessment. The regional ranges of contaminant concentrations in sediments were identified based on an in-depth literature review, and the lowest measured concentrations were evaluated to determine the "background concentrations" of chemical substances not yet targeted in the Mediterranean Sea. In addition, to verify the suitability of the approach for validating large data collections provided by multiple sources, the determined ranges were used to validate a regional dataset available through EMODnet data infrastructure.


Assuntos
Hidrocarbonetos Policíclicos Aromáticos , Oligoelementos , Poluentes Químicos da Água , Hidrocarbonetos Policíclicos Aromáticos/análise , Sedimentos Geológicos/química , Monitoramento Ambiental/métodos , Poluentes Químicos da Água/análise , Coleta de Dados , Controle de Qualidade
11.
JMIR Med Inform ; 10(4): e36481, 2022 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-35416792

RESUMO

BACKGROUND: With the advent of data-intensive science, a full integration of big data science and health care will bring a cross-field revolution to the medical community in China. The concept big data represents not only a technology but also a resource and a method. Big data are regarded as an important strategic resource both at the national level and at the medical institutional level, thus great importance has been attached to the construction of a big data platform for health care. OBJECTIVE: We aimed to develop and implement a big data platform for a large hospital, to overcome difficulties in integrating, calculating, storing, and governing multisource heterogeneous data in a standardized way, as well as to ensure health care data security. METHODS: The project to build a big data platform at West China Hospital of Sichuan University was launched in 2017. The West China Hospital of Sichuan University big data platform has extracted, integrated, and governed data from different departments and sections of the hospital since January 2008. A master-slave mode was implemented to realize the real-time integration of multisource heterogeneous massive data, and an environment that separates heterogeneous characteristic data storage and calculation processes was built. A business-based metadata model was improved for data quality control, and a standardized health care data governance system and scientific closed-loop data security ecology were established. RESULTS: After 3 years of design, development, and testing, the West China Hospital of Sichuan University big data platform was formally brought online in November 2020. It has formed a massive multidimensional data resource database, with more than 12.49 million patients, 75.67 million visits, and 8475 data variables. Along with hospital operations data, newly generated data are entered into the platform in real time. Since its launch, the platform has supported more than 20 major projects and provided data service, storage, and computing power support to many scientific teams, facilitating a shift in the data support model-from conventional manual extraction to self-service retrieval (which has reached 8561 retrievals per month). CONCLUSIONS: The platform can combine operation systems data from all departments and sections in a hospital to form a massive high-dimensional high-quality health care database that allows electronic medical records to be used effectively and taps into the value of data to fully support clinical services, scientific research, and operations management. The West China Hospital of Sichuan University big data platform can successfully generate multisource heterogeneous data storage and computing power. By effectively governing massive multidimensional data gathered from multiple sources, the West China Hospital of Sichuan University big data platform provides highly available data assets and thus has a high application value in the health care field. The West China Hospital of Sichuan University big data platform facilitates simpler and more efficient utilization of electronic medical record data for real-world research.

12.
Sensors (Basel) ; 22(4)2022 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-35214486

RESUMO

The rapid evolution of sensors and communication technologies has led to the production and transfer of mass data streams from vehicles either inside their electronic units or to the outside world using the internet infrastructure. The "outside world", in most cases, consists of third-party applications, such as fleet or traffic management control centers, which utilize vehicular data for reporting and monitoring functionalities. Such applications, in most cases, in order to facilitate their needs, require the exchange and processing of vast amounts of data which can be handled by the so-called Big Data technologies. The purpose of this study is to present a hybrid platform suitable for data collection, storing and analysis enhanced with quality control actions. In particular, the collected data contain various formats originating from different vehicle sensors and are stored in the aforementioned platform in a continuous way. The stored data in this platform must be checked in order to determine and validate them in terms of quality. To do so, certain actions, such as missing values checks, format checks, range checks, etc., must be carried out. The results of the quality control functions are presented herein, and useful conclusions are drawn in order to avoid possible data quality problems which may occur in further analysis and use of the data, e.g., for training of artificial intelligence models.

13.
Artigo em Chinês | WPRIM (Pacífico Ocidental) | ID: wpr-996019

RESUMO

Quality management and control of single disease is a means to continuously improve medical quality and safety by building a set of quality control indicators and evaluation systems based on the whole process of disease diagnosis and treatment. In the actual single disease management process, the reporting of each disease involved data from various systems such as electronic medical records, and the data integration was difficult. While the traditional manual reporting method took a lot of time and the data accuracy could not be guaranteed. In the development process of hospital informatization, a hospital has designed a set of intelligent full-closed loop single disease management platform based on the hospital information system, by integrating the existing human and information data resources of the hospital. This platform integrated functions of single disease intranet reporting, in-depth capture of reporting elements, single-disease quality index management, and single-disease real-time intelligent control, in order to promote more refined and intelligent disease management and thus steadily improve medical quality and safety.

14.
Comput Methods Programs Biomed ; 211: 106394, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34560604

RESUMO

BACKGROUND AND OBJECTIVE: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). METHODS: We show step-by-step how to implement the analytics pipeline for the question: 'In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?'. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. RESULTS: Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. CONCLUSION: Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world.


Assuntos
COVID-19 , Pandemias , Humanos , Modelos Logísticos , Aprendizado de Máquina , SARS-CoV-2
15.
Mycorrhiza ; 31(6): 671-683, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34508280

RESUMO

Nearly 150 years of research has accumulated large amounts of data on mycorrhizal association types in plants. However, this important resource includes unreliable allocated traits for some species. An audit of six commonly used data sources revealed a high degree of consistency in the mycorrhizal status of most species, genera and families of vascular plants, but there were some records that contradict the majority of other data (~ 10% of data overall). Careful analysis of contradictory records using rigorous definitions of association types revealed that the majority were diagnosis errors, which often stem from references predating modern knowledge of mycorrhiza types. Other errors are linked to inadequate microscopic examinations of roots or plants with complex root anatomy, such as phi thickenings or beaded roots. Errors consistently occurred at much lower frequencies than correct records but have accumulated in uncorrected databases. This results in less accurate knowledge about dominant plants in some ecosystems because they were sampled more often. Errors have also propagated from one database to another over decades when data were amalgamated without checking their suitability. Due to these errors, it is often incorrect to designate plants reported to have inconsistent mycorrhizas as "facultatively mycorrhizal". Updated protocols for resolving conflicting mycorrhizal data are provided here. These are based on standard morphological definitions of association types, which are the foundations of mycorrhizal science. This analysis also identifies the need for adequate training and mentoring of researchers to maintain the quality of mycorrhizal research.


Assuntos
Magnoliopsida , Micorrizas , Bases de Dados Factuais , Ecossistema , Plantas
16.
BMC Res Notes ; 14(1): 366, 2021 Sep 20.
Artigo em Inglês | MEDLINE | ID: mdl-34544495

RESUMO

OBJECTIVE: Among the different methods to profile the genome-wide patterns of transcription factor binding and histone modifications in cells and tissues, CUT&RUN has emerged as a more efficient approach that allows for a higher signal-to-noise ratio using fewer number of cells compared to ChIP-seq. The results from CUT&RUN and other related sequence enrichment assays requires comprehensive quality control (QC) and comparative analysis of data quality across replicates. While several computational tools currently exist for read mapping and analysis, a systematic reporting of data quality is lacking. Our aims were to (1) compare methods for using frozen versus fresh cells for CUT&RUN and (2) to develop an easy-to-use pipeline for assessing data quality. RESULTS: We compared a workflow for CUT&RUN with fresh and frozen samples, and present an R package called ssvQC for quality control and comparison of data quality derived from CUT&RUN and other enrichment-based sequence data. Using ssvQC, we evaluate results from different CUT&RUN protocols for transcription factors and histone modifications from fresh and frozen tissue samples. Overall, this process facilitates evaluation of data quality across datasets and permits inspection of peak calling analysis, replicate analysis of different data types. The package ssvQC is readily available at https://github.com/FrietzeLabUVM/ssvQC .


Assuntos
Código das Histonas , Fatores de Transcrição , Imunoprecipitação da Cromatina , Sequenciamento de Nucleotídeos em Larga Escala , Controle de Qualidade , Fluxo de Trabalho
17.
BMC Bioinformatics ; 22(Suppl 6): 396, 2021 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-34362304

RESUMO

BACKGROUND: Meiotic recombination is a vital biological process playing an essential role in genome's structural and functional dynamics. Genomes exhibit highly various recombination profiles along chromosomes associated with several chromatin states. However, eu-heterochromatin boundaries are not available nor easily provided for non-model organisms, especially for newly sequenced ones. Hence, we miss accurate local recombination rates necessary to address evolutionary questions. RESULTS: Here, we propose an automated computational tool, based on the Marey maps method, allowing to identify heterochromatin boundaries along chromosomes and estimating local recombination rates. Our method, called BREC (heterochromatin Boundaries and RECombination rate estimates) is non-genome-specific, running even on non-model genomes as long as genetic and physical maps are available. BREC is based on pure statistics and is data-driven, implying that good input data quality remains a strong requirement. Therefore, a data pre-processing module (data quality control and cleaning) is provided. Experiments show that BREC handles different markers' density and distribution issues. CONCLUSIONS: BREC's heterochromatin boundaries have been validated with cytological equivalents experimentally generated on the fruit fly Drosophila melanogaster genome, for which BREC returns congruent corresponding values. Also, BREC's recombination rates have been compared with previously reported estimates. Based on the promising results, we believe our tool has the potential to help bring data science into the service of genome biology and evolution. We introduce BREC within an R-package and a Shiny web-based user-friendly application yielding a fast, easy-to-use, and broadly accessible resource. The BREC R-package is available at the GitHub repository https://github.com/GenomeStructureOrganization .


Assuntos
Heterocromatina , Aplicativos Móveis , Animais , Mapeamento Cromossômico , Drosophila melanogaster/genética , Heterocromatina/genética , Recombinação Genética
18.
Sensors (Basel) ; 21(10)2021 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-34069085

RESUMO

In seismology, an increased effort to observe all 12 degrees of freedom of seismic ground motion by complementing translational ground motion observations with measurements of strain and rotational motions could be witnessed in recent decades, aiming at an enhanced probing and understanding of Earth and other planetary bodies. The evolution of optical instrumentation, in particular large-scale ring laser installations, such as G-ring and ROMY (ROtational Motion in seismologY), and their geoscientific application have contributed significantly to the emergence of this scientific field. The currently most advanced, large-scale ring laser array is ROMY, which is unprecedented in scale and design. As a heterolithic structure, ROMY's ring laser components are subject to optical frequency drifts. Such Sagnac interferometers require new considerations and approaches concerning data acquisition, processing and quality assessment, compared to conventional, mechanical instrumentation. We present an automated approach to assess the data quality and the performance of a ring laser, based on characteristics of the interferometric Sagnac signal. The developed scheme is applied to ROMY data to detect compromised operation states and assign quality flags. When ROMY's database becomes publicly accessible, this assessment will be employed to provide a quality control feature for data requests.

19.
Int J Med Inform ; 150: 104454, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33866231

RESUMO

OBJECTIVE: This study compares seven machine learning models developed to predict childhood obesity from age > 2 to ≤ 7 years using Electronic Healthcare Record (EHR) data up to age 2 years. MATERIALS AND METHODS: EHR data from of 860,510 patients with 11,194,579 healthcare encounters were obtained from the Children's Hospital of Philadelphia. After applying stringent quality control to remove implausible growth values and including only individuals with all recommended wellness visits by age 7 years, 27,203 (50.78 % male) patients remained for model development. Seven machine learning models were developed to predict obesity incidence as defined by the Centers for Disease Control and Prevention (age/sex adjusted BMI>95th percentile). Model performance was evaluated by multiple standard classifier metrics and the differences among seven models were compared using the Cochran's Q test and post-hoc pairwise testing. RESULTS: XGBoost yielded 0.81 (0.001) AUC, which outperformed all other models. It also achieved statistically significant better performance than all other models on standard classifier metrics (sensitivity fixed at 80 %): precision 30.90 % (0.22 %), F1-socre 44.60 % (0.26 %), accuracy 66.14 % (0.41 %), and specificity 63.27 % (0.41 %). DISCUSSION AND CONCLUSION: Early childhood obesity prediction models were developed from the largest cohort reported to date. Relative to prior research, our models generalize to include males and females in a single model and extend the time frame for obesity incidence prediction to 7 years of age. The presented machine learning model development workflow can be adapted to various EHR-based studies and may be valuable for developing other clinical prediction models.


Assuntos
Registros Eletrônicos de Saúde , Obesidade Infantil , Criança , Pré-Escolar , Estudos de Coortes , Feminino , Humanos , Incidência , Aprendizado de Máquina , Masculino , Obesidade Infantil/epidemiologia
20.
Sci Total Environ ; 779: 146381, 2021 Jul 20.
Artigo em Inglês | MEDLINE | ID: mdl-33743460

RESUMO

Low-cost air quality sensor networks have been increasingly used for high spatial resolution air quality monitoring in recent years. Ensuring data reliability during continuous operation is critical for these sensor networks. Using particulate matter sensor as an example, this study reports a data quality control method, including sensor selection, pre-calibration, and online inspection. It was used in developing and operating the dense low-cost particle sensor networks in two Chinese cities. Firstly, seven mainstream sensors were tested and one model of particle sensor was selected due to its better linearity and stability. For a batch of sensors of the same model, although they were calibrated after manufactured, there are differences in response toward the same concentration of pollutants. The systematical variation of sensors was corrected and unified through pre-calibration. After deploying them in the field, a data analysis method is established for online inspecting their working status. Using data from these sensors, it evaluates parameters such as intraclass correlation coefficients and normalized root mean square error. These two metrics help to construct a two-dimensional coordinate system and to classify sensors into four status, including normal, fluctuation, hotspots, and malfunction. During a one-month operation in the two cities, 8 (out of 82) and 10 (out of 59) sensors with suspected malfunctions were screened out for further on-site inspection. Moreover, the sensor networks show potential in identifying illegal emission sources that cannot be typically detected by sparse regulatory air quality monitoring stations.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA