RESUMEN
Accident investigation reports provide useful knowledge to support companies to propose preventive and mitigative measures. However, the information presented in accident report databases is normally large, complex, filled with errors and has missing and/or redundant data. In this article, we propose text mining and natural language processing techniques to investigate low-quality accident reports. We adopted machine learning (ML) to detect and investigate inconsistencies on accident reports. The methodology was applied to 626 documents collected from an actual hydroelectric power company. The initial ML performances indicated data divergences and concerns related to the report structure. Then, the accident database was restructured to a more proper form confirming the supposition about the quality of the reports investigated. The proposed approach can be used as a diagnostic tool to improve the design of accident investigation reports to provide a more useful source of knowledge to support decisions in the safety context.
Asunto(s)
Accidentes , Minería de Datos , Humanos , Minería de Datos/métodos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Bases de Datos FactualesRESUMEN
The increasing number of COVID-19 infections brought by the current pandemic has encouraged the scientific community to analyze the seroprevalence in populations to support health policies. In this context, accurate estimations of SARS-CoV-2 antibodies based on antibody tests metrics (e.g., specificity and sensitivity) and the study of population characteristics are essential. Here, we propose a Bayesian analysis using IgA and IgG antibody levels through multiple scenarios regarding data availability from different information sources to estimate the seroprevalence of health professionals in a Northeastern Brazilian city: no data available, data only related to the test performance, data from other regions. The study population comprises 432 subjects with more than 620 collections analyzed via IgA/IgG ELISA tests. We conducted the study in pre- and post-vaccination campaigns started in Brazil. We discuss the importance of aggregating available data from various sources to create informative prior knowledge. Considering prior information from the USA and Europe, the pre-vaccine seroprevalence means are 8.04% and 10.09% for IgG and 7.40% and 9.11% for IgA. For the post-vaccination campaign and considering local informative prior, the median is 84.83% for IgG, which confirms a sharp increase in the seroprevalence after vaccination. Additionally, stratification considering differences in sex, age (younger than 30 years, between 30 and 49 years, and older than 49 years), and presence of comorbidities are provided for all scenarios.
Asunto(s)
COVID-19 , Vacunas , Adulto , Anticuerpos Antivirales , Teorema de Bayes , Brasil/epidemiología , COVID-19/epidemiología , COVID-19/prevención & control , Humanos , Inmunoglobulina A , Inmunoglobulina G , SARS-CoV-2 , Estudios SeroepidemiológicosRESUMEN
As SARS-CoV-2 has spread quickly throughout the world, the scientific community has spent major efforts on better understanding the characteristics of the virus and possible means to prevent, diagnose, and treat COVID-19. A valid approach presented in the literature is to develop an image-based method to support COVID-19 diagnosis using convolutional neural networks (CNN). Because the availability of radiological data is rather limited due to the novelty of COVID-19, several methodologies consider reduced datasets, which may be inadequate, biasing the model. Here, we performed an analysis combining six different databases using chest X-ray images from open datasets to distinguish images of infected patients while differentiating COVID-19 and pneumonia from 'no-findings' images. In addition, the performance of models created from fewer databases, which may imperceptibly overestimate their results, is discussed. Two CNN-based architectures were created to process images of different sizes (512 × 512, 768 × 768, 1024 × 1024, and 1536 × 1536). Our best model achieved a balanced accuracy (BA) of 87.7% in predicting one of the three classes ('no-findings', 'COVID-19', and 'pneumonia') and a specific balanced precision of 97.0% for 'COVID-19' class. We also provided binary classification with a precision of 91.0% for detection of sick patients (i.e., with COVID-19 or pneumonia) and 98.4% for COVID-19 detection (i.e., differentiating from 'no-findings' or 'pneumonia'). Indeed, despite we achieved an unrealistic 97.2% BA performance for one specific case, the proposed methodology of using multiple databases achieved better and less inflated results than from models with specific image datasets for training. Thus, this framework is promising for a low-cost, fast, and noninvasive means to support the diagnosis of COVID-19.