Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 240
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Am J Hum Genet ; 110(12): 2056-2067, 2023 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-38006880

RESUMO

Detection of aberrantly spliced genes is an important step in RNA-seq-based rare-disease diagnostics. We recently developed FRASER, a denoising autoencoder-based method that outperformed alternative methods of detecting aberrant splicing. However, because FRASER's three splice metrics are partially redundant and tend to be sensitive to sequencing depth, we introduce here a more robust intron-excision metric, the intron Jaccard index, that combines the alternative donor, alternative acceptor, and intron-retention signal into a single value. Moreover, we optimized model parameters and filter cutoffs by using candidate rare-splice-disrupting variants as independent evidence. On 16,213 GTEx samples, our improved algorithm, FRASER 2.0, called typically 10 times fewer splicing outliers while increasing the proportion of candidate rare-splice-disrupting variants by 10-fold and substantially decreasing the effect of sequencing depth on the number of reported outliers. To lower the multiple-testing correction burden, we introduce an option to select the genes to be tested for each sample instead of a transcriptome-wide approach. This option can be particularly useful when prior information, such as candidate variants or genes, is available. Application on 303 rare-disease samples confirmed the relative reduction in the number of outlier calls for a slight loss of sensitivity; FRASER 2.0 recovered 22 out of 26 previously identified pathogenic splicing cases with default cutoffs and 24 when multiple-testing correction was limited to OMIM genes containing rare variants. Altogether, these methodological improvements contribute to more effective RNA-seq-based rare diagnostics by drastically reducing the amount of splicing outlier calls per sample at minimal loss of sensitivity.


Assuntos
Processamento Alternativo , Splicing de RNA , Humanos , Processamento Alternativo/genética , Íntrons/genética , Splicing de RNA/genética , RNA-Seq , Algoritmos
2.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38557674

RESUMO

Quality control in quantitative proteomics is a persistent challenge, particularly in identifying and managing outliers. Unsupervised learning models, which rely on data structure rather than predefined labels, offer potential solutions. However, without clear labels, their effectiveness might be compromised. Single models are susceptible to the randomness of parameters and initialization, which can result in a high rate of false positives. Ensemble models, on the other hand, have shown capabilities in effectively mitigating the impacts of such randomness and assisting in accurately detecting true outliers. Therefore, we introduced SEAOP, a Python toolbox that utilizes an ensemble mechanism by integrating multi-round data management and a statistics-based decision pipeline with multiple models. Specifically, SEAOP uses multi-round resampling to create diverse sub-data spaces and employs outlier detection methods to identify candidate outliers in each space. Candidates are then aggregated as confirmed outliers via a chi-square test, adhering to a 95% confidence level, to ensure the precision of the unsupervised approaches. Additionally, SEAOP introduces a visualization strategy, specifically designed to intuitively and effectively display the distribution of both outlier and non-outlier samples. Optimal hyperparameter models of SEAOP for outlier detection were identified by using a gradient-simulated standard dataset and Mann-Kendall trend test. The performance of the SEAOP toolbox was evaluated using three experimental datasets, confirming its reliability and accuracy in handling quantitative proteomics.


Assuntos
Gerenciamento de Dados , Proteômica , Reprodutibilidade dos Testes , Controle de Qualidade , Interpretação Estatística de Dados
3.
Proc Natl Acad Sci U S A ; 119(9)2022 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-35197293

RESUMO

Entropic outlier sparsification (EOS) is proposed as a cheap and robust computational strategy for learning in the presence of data anomalies and outliers. EOS dwells on the derived analytic solution of the (weighted) expected loss minimization problem subject to Shannon entropy regularization. An identified closed-form solution is proven to impose additional costs that depend linearly on statistics size and are independent of data dimension. Obtained analytic results also explain why the mixtures of spherically symmetric Gaussians-used heuristically in many popular data analysis algorithms-represent an optimal and least-biased choice for the nonparametric probability distributions when working with squared Euclidean distances. The performance of EOS is compared to a range of commonly used tools on synthetic problems and on partially mislabeled supervised classification problems from biomedicine. Applying EOS for coinference of data anomalies during learning is shown to allow reaching an accuracy of [Formula: see text] when predicting patient mortality after heart failure, statistically significantly outperforming predictive performance of common learning tools for the same data.

4.
Cytometry A ; 105(8): 580-594, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38995093

RESUMO

Senescence is an irreversible arrest of the cell cycle that can be characterized by markers of senescence such as p16, p21, and KI-67. The characterization of different senescence-associated phenotypes requires selection of the most relevant senescence markers to define reliable cytometric methodologies. Mass cytometry (a.k.a. Cytometry by time of flight, CyTOF) can monitor up to 40 different cell markers at the single-cell level and has the potential to integrate multiple senescence and other phenotypic markers to identify senescent cells within a complex tissue such as skeletal muscle, with greater accuracy and scalability than traditional bulk measurements and flow cytometry-based measurements. This article introduces an analysis framework for detecting putative senescent cells based on clustering, outlier detection, and Boolean logic for outliers. Results show that the pipeline can identify putative senescent cells in skeletal muscle with well-established markers such as p21 and potential markers such as GAPDH. It was also found that heterogeneity of putative senescent cells in skeletal muscle can partly be explained by their cell type. Additionally, autophagy-related proteins ATG4A, LRRK2, and GLB1 were identified as important proteins in predicting the putative senescent population, providing insights into the association between autophagy and senescence. It was observed that sex did not affect the proportion of putative senescent cells among total cells. However, age did have an effect, with a higher proportion observed in fibro/adipogenic progenitors (FAPs), satellite cells, M1 and M2 macrophages from old mice. Moreover, putative senescent cells from muscle of old and young mice show different expression levels of senescence-related proteins, with putative senescent cells of old mice having higher levels of p21 and GAPDH, whereas putative senescent cells of young mice had higher levels of IL-6. Overall, the analysis framework prioritizes multiple senescence-associated proteins to characterize putative senescent cells sourced from tissue made of different cell types.


Assuntos
Biomarcadores , Senescência Celular , Citometria de Fluxo , Músculo Esquelético , Animais , Senescência Celular/fisiologia , Camundongos , Músculo Esquelético/citologia , Músculo Esquelético/metabolismo , Citometria de Fluxo/métodos , Biomarcadores/metabolismo , Feminino , Masculino , Camundongos Endogâmicos C57BL , Inibidor de Quinase Dependente de Ciclina p21/metabolismo , Análise de Célula Única/métodos
5.
Mol Ecol ; 33(17): e17490, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39135406

RESUMO

Plant pathogens are constantly under selection pressure for host resistance adaptation. Soybean cyst nematode (SCN, Heterodera glycines) is a major pest of soybean primarily managed through resistant cultivars; however, SCN populations have evolved virulence in response to selection pressures driven by repeated monoculture of the same genetic resistance. Resistance to SCN is mediated by multiple epistatic interactions between Rhg (for resistance to H. glycines) genes. However, the identity of SCN virulence genes that confer the ability to overcome resistance remains unknown. To identify candidate genomic regions showing signatures of selection for increased virulence, we conducted whole genome resequencing of pooled individuals (Pool-Seq) from two pairs of SCN populations adapted on soybeans with Peking-type (rhg1-a, rhg2, and Rhg4) resistance. Population differentiation and principal component analysis-based approaches identified approximately 0.72-0.79 million SNPs, the frequency of which showed potential selection signatures across multiple genomic regions. Chromosomes 3 and 6 between population pairs showed the greatest density of outlier SNPs with high population differentiation. Conducting multiple outlier detection tests to identify overlapping SNPs resulted in a total of 966 significantly differentiated SNPs, of which 285 exon SNPs were mapped to 97 genes. Of these, six genes encoded members of known stylet-secreted effector protein families potentially involved in host defence modulation including venom-allergen-like, annexin, glutathione synthetase, SPRYSEC, chitinase, and CLE effector proteins. Further functional analysis of identified candidate genes will provide new insights into the genetic mechanisms by which SCN overcomes soybean resistance and inform the development of molecular markers for rapidly screening the virulence profile of an SCN-infested field.


Assuntos
Resistência à Doença , Glycine max , Doenças das Plantas , Polimorfismo de Nucleotídeo Único , Tylenchoidea , Animais , Glycine max/genética , Glycine max/parasitologia , Polimorfismo de Nucleotídeo Único/genética , Virulência/genética , Doenças das Plantas/parasitologia , Doenças das Plantas/genética , Resistência à Doença/genética , Tylenchoidea/genética , Tylenchoidea/patogenicidade , Seleção Genética , Genética Populacional , Sequenciamento Completo do Genoma
6.
BMC Med Res Methodol ; 24(1): 89, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38622516

RESUMO

BACKGROUND: Outliers, data points that significantly deviate from the norm, can have a substantial impact on statistical inference and provide valuable insights in data analysis. Multiple methods have been developed for outlier detection, however, almost all available approaches fail to consider the spatial dependence and heterogeneity in spatial data. Spatial data has diverse formats and semantics, requiring specialized outlier detection methodology to handle these unique properties. For now, there is limited research exists on robust spatial outlier detection methods designed specifically under the spatial error model (SEM) structure. METHOD: We propose the Spatial-Θ-Iterative Procedure for Outlier Detection (Spatial-Θ-IPOD), which utilizes a mean-shift vector to identify outliers within the SEM. Our method enables an effective detection of spatial outliers while also providing robust coefficient estimates. To assess the performance of our approach, we conducted extensive simulations and applied it to a real-world empirical study using life expectancy data from multiple countries. RESULTS: Simulation results showed that the masking and JD (Joint Detection) indicators of our Spatial-Θ-IPOD method outperformed several commonly used methods, even in high-dimensional scenarios, demonstrating stable performance. Conversely, the Θ-IPOD method proved to be ineffective in detecting outliers when spatial correlation was present. Moreover, our model successfully provided reliable coefficient estimation alongside outlier detection. The proposed method consistently outperformed other models (both robust and non-robust) in most cases. In the empirical study, our proposed model successfully detected outliers and provided valuable insights in the modeling process. CONCLUSIONS: Our proposed Spatial-Θ-IPOD offers an effective solution for detecting spatial outliers for SEM while providing robust coefficient estimates. Notably, our approach showcases its relative superiority even in the presence of high leverage points. By successfully identifying outliers, our method enhances the overall understanding of the data and provides valuable insights for further analysis.

7.
BMC Med Inform Decis Mak ; 24(1): 49, 2024 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-38355504

RESUMO

BACKGROUND: Unsupervised clustering and outlier detection are important in medical research to understand the distributional composition of a collective of patients. A number of clustering methods exist, also for high-dimensional data after dimension reduction. Clustering and outlier detection may, however, become less robust or contradictory if multiple high-dimensional data sets per patient exist. Such a scenario is given when the focus is on 3-D data of multiple organs per patient, and a high-dimensional feature matrix per organ is extracted. METHODS: We use principal component analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and multiple co-inertia analysis (MCIA) combined with bagplots to study the distribution of multi-organ 3-D data taken by computed tomography scans. After point-set registration of multiple organs from two public data sets, multiple hundred shape features are extracted per organ. While PCA and t-SNE can only be applied to each organ individually, MCIA can project the data of all organs into the same low-dimensional space. RESULTS: MCIA is the only approach, here, with which data of all organs can be projected into the same low-dimensional space. We studied how frequently (i.e., by how many organs) a patient was classified to belong to the inner or outer 50% of the population, or as an outlier. Outliers could only be detected with MCIA and PCA. MCIA and t-SNE were more robust in judging the distributional location of a patient in contrast to PCA. CONCLUSIONS: MCIA is more appropriate and robust in judging the distributional location of a patient in the case of multiple high-dimensional data sets per patient. It is still recommendable to apply PCA or t-SNE in parallel to MCIA to study the location of individual organs.


Assuntos
Algoritmos , Tomografia Computadorizada por Raios X , Humanos , Análise por Conglomerados , Análise de Componente Principal
8.
Microsc Microanal ; 2024 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-39189873

RESUMO

Atom probe tomography (APT) is commonly used to study solute clustering and precipitation in materials. However, standard techniques used to identify and characterize clusters within atom probe data, such as the density-based spatial clustering applications with noise (DBSCAN), often underperform with respect to small clusters. This is a limitation of density-based cluster identification algorithms, due to their dependence on the parameter Nmin, an arbitrary lower limit placed on detectable cluster sizes. Therefore, this article attempts to consider the characterization of clustering in atom probe data as an outlier detection problem of which k-nearest neighbors local outlier factor and learnable unified neighborhood-based anomaly ranking algorithms were tested against a simulated dataset and compared to the standard method. The decision score output of the algorithms was then auto thresholded by the Karcher mean to remove human bias. Each of the major models tested outperforms DBSCAN for cluster sizes of <25 atoms but underperforms for sizes >30 atoms using simulated data. However, the new combined k-nearest neighbors (k-NN) and DBSCAN method presented was able to perform well at all cluster sizes. The combined k-NN and seven methods are presented as a new approach to identifying clusters in APT.

9.
J Appl Clin Med Phys ; 25(2): e14154, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37683120

RESUMO

BACKGROUND: Tolerance limit is defined on pre-treatment patient specific quality assurance results to identify "out of the norm" dose discrepancy in plan. An out-of-tolerance plan during measurement can often cause treatment delays especially if replanning is required. In this study, we aim to develop an outlier detection model to identify out-of-tolerance plan early during treatment planning phase to mitigate the above-mentioned risks. METHODS: Patient-specific quality assurance results with portal dosimetry for stereotactic body radiotherapy measured between January 2020 and December 2021 were used in this study. Data were divided into thorax and pelvis sites and gamma passing rates were recorded using 2%/2 mm, 2%/1 mm, and 1%/1 mm gamma criteria. Statistical process control method was used to determine six different site and criterion-specific tolerance and action limits. Using only the inliers identified with our determined tolerance limits, we trained three different outlier detection models using the plan complexity metrics extracted from each treatment field-robust covariance, isolation forest, and one class support vector machine. The hyperparameters were optimized using the F1-score calculated from both the inliers and validation outliers' data. RESULTS: 308 pelvis and 200 thorax fields were used in this study. The tolerance (action) limits for 2%/2 mm, 2%/1 mm, and 1%/1 mm gamma criteria in the pelvis site are 99.1% (98.1%), 95.8% (91.1%), and 91.7% (86.1%), respectively. The tolerance (action) limits in the thorax site are 99.0% (98.7%), 97.0% (96.2%), and 91.5% (87.2%). One class support vector machine performs the best among all the algorithms. The best performing model in the thorax (pelvis) site achieves a precision of 0.56 (0.54), recall of 1.0 (1.0), and F1-score of 0.72 (0.70) when using the 2%/2 mm (2%/1 mm) criterion. CONCLUSION: The model will help the planner to identify an out-of-tolerance plan early so that they can refine the plan further during the planning stage without risking late discovery during measurement.


Assuntos
Radiocirurgia , Radioterapia de Intensidade Modulada , Humanos , Planejamento da Radioterapia Assistida por Computador/métodos , Dosagem Radioterapêutica , Algoritmos , Pelve , Radiometria/métodos , Radioterapia de Intensidade Modulada/métodos , Garantia da Qualidade dos Cuidados de Saúde
10.
Sensors (Basel) ; 24(4)2024 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-38400399

RESUMO

There have been numerous studies attempting to overcome the limitations of current autonomous driving technologies. However, there is no doubt that it is challenging to promise integrity of safety regarding urban driving scenarios and dynamic driving environments. Among the reported countermeasures to supplement the uncertain behavior of autonomous vehicles, teleoperation of the vehicle has been introduced to deal with the disengagement of autonomous driving. However, teleoperation can lead the vehicle to unforeseen and hazardous situations from the viewpoint of wireless communication stability. In particular, communication delay outliers that severely deviate from the passive communication delay should be highlighted because they could hamper the cognition of the circumstances monitored by the teleoperator, or the control signal could be contaminated regardless of the teleoperator's intention. In this study, communication delay outliers were detected and classified based on the stochastic approach (passive delays and outliers were estimated as 98.67% and 1.33%, respectively). Results indicate that communication delay outliers can be automatically detected, independently of the real-time quality of wireless communication stability. Moreover, the proposed framework demonstrates resilience against outliers, thereby mitigating potential performance degradation.

11.
Int J Mol Sci ; 25(3)2024 Jan 26.
Artigo em Inglês | MEDLINE | ID: mdl-38338841

RESUMO

Human tear fluid contains numerous compounds, which are present in highly variable amounts owing to the dynamic and multipurpose functions of tears. A better understanding of the level and sources of variance is essential for determining the functions of the different tear components and the limitations of tear samples as a potential biomarker source. In this study, a quantitative proteomic method was used to analyze variations in the tear protein profiles of healthy volunteers. High day-to-day and inter-eye personal variances were observed in the tear volumes, protein content, and composition of the tear samples. Several normalization and outlier exclusion approaches were evaluated to decrease variances. Despite the intrapersonal variances, statistically significant differences and cluster analysis revealed that proteome profile and immunoglobulin composition of tear fluid present personal characteristics. Using correlation analysis, we could identify several correlating protein clusters, mainly related to the source of the proteins. Our study is the first attempt to achieve more insight into the biochemical background of human tears by statistical evaluation of the experimentally observed dynamic behavior of the tear proteome. As a pilot study for determination of personal protein profiles of the tear fluids of individual patients, it contributes to the application of this noninvasively collectible body fluid in personal medicine.


Assuntos
Proteoma , Proteômica , Humanos , Proteoma/metabolismo , Proteômica/métodos , Projetos Piloto , Lágrimas/metabolismo , Proteínas do Olho/metabolismo , Controle de Qualidade
12.
Neuroimage ; 283: 120397, 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-37820862

RESUMO

Diffusion-weighted MRI (dMRI) is a medical imaging method that can be used to investigate the brain microstructure and structural connections between different brain regions. The method, however, requires relatively complex data processing frameworks and analysis pipelines. Many of these approaches are vulnerable to signal dropout artefacts that can originate from subjects moving their head during the scan. To combat these artefacts and eliminate such outliers, researchers have proposed two approaches: to replace outliers or to downweight outliers during modelling and analysis. With the rising interest in dMRI for clinical research, these types of corrections are increasingly important. Therefore, we set out to investigate the differences between outlier replacement and weighting approaches to help the dMRI community to select the best tool for their data processing pipelines. We evaluated dMRI motion correction registration and single tensor model fit pipelines using Gaussian Process and Spherical Harmonic based replacement approaches and outlier downweighting using highly realistic whole-brain simulations. As a proof of concept, we applied these approaches to dMRI infant data sets that contained varying numbers of dropout artefacts. Based on our results, we concluded that the Gaussian Process based outlier replacement provided similar tensor fit results to Gaussian Process based outlier detection and downweighting. Therefore, if only the least-squares estimate of the single tensor model is of interest, our recommendation is to use outlier replacement. However, outlier downweighting can potentially provide a more accurate estimate of the model precision which could be relevant for applications such as probabilistic tractoraphy.


Assuntos
Algoritmos , Imagem de Difusão por Ressonância Magnética , Humanos , Imagem de Difusão por Ressonância Magnética/métodos , Encéfalo/diagnóstico por imagem , Artefatos , Análise dos Mínimos Quadrados
13.
Cytometry A ; 103(1): 71-81, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-35796000

RESUMO

Technical artifacts such as clogging that occur during the data acquisition process of flow cytometry data can cause spurious events and fluorescence intensity shifting that impact the quality of the data and its analysis results. These events should be identified and potentially removed before being passed to the next stage of analysis. flowCut, an R package, automatically detects anomaly events in flow cytometry experiments and flags files for potential review. Its results are on par with manual analysis and it outperforms existing automated approaches.


Assuntos
Citometria de Fluxo , Citometria de Fluxo/métodos , Biologia Computacional
14.
Epilepsia ; 64(8): 2027-2043, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37199673

RESUMO

OBJECTIVE: We studied the rate dynamics of interictal events occurring over fast-ultradian time scales, as commonly examined in clinics to guide surgical planning in epilepsy. METHODS: Stereo-electroencephalography (SEEG) traces of 35 patients with good surgical outcome (Engel I) were analyzed. For this we developed a general data mining method aimed at clustering the plethora of transient waveform shapes including interictal epileptiform discharges (IEDs) and assessed the temporal fluctuations in the capability of mapping the epileptogenic zone (EZ) of each type of event. RESULTS: We found that the fast-ultradian dynamics of the IED rate may effectively impair the precision of EZ identification, and appear to occur spontaneously, that is, not triggered by or exclusively associated with a particular cognitive task, wakefulness, sleep, seizure occurrence, post-ictal state, or antiepileptic drug withdrawal. Propagation of IEDs from the EZ to the propagation zone (PZ) could explain the observed fast-ultradian fluctuations in a reduced fraction of the analyzed patients, suggesting that other factors like the excitability of the epileptogenic tissue could play a more relevant role. A novel link was found between the fast-ultradian dynamics of the overall rate of polymorphic events and the rate of specific IEDs subtypes. We exploited this feature to estimate in each patient the 5 min interictal epoch for near-optimal EZ and resected-zone (RZ) localization. This approach produces at the population level a better EZ/RZ classification when compared to both (1) the whole time series available in each patient (p = .084 for EZ, p < .001 for RZ, Wilcoxon signed-rank test) and (2) 5 min epochs sampled randomly from the interictal recordings of each patient (p < .05 for EZ, p < .001 for RZ, 105 random samplings). SIGNIFICANCE: Our results highlight the relevance of the fast-ultradian IED dynamics in mapping the EZ, and show how this dynamics can be estimated prospectively to inform surgical planning in epilepsy.


Assuntos
Epilepsia Resistente a Medicamentos , Epilepsias Parciais , Epilepsia , Humanos , Epilepsia Resistente a Medicamentos/cirurgia , Convulsões , Epilepsia/cirurgia , Eletroencefalografia/métodos , Epilepsias Parciais/cirurgia
15.
BMC Med Res Methodol ; 23(1): 177, 2023 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-37528402

RESUMO

BACKGROUND: Epidemiologic and medical studies often rely on evaluators to obtain measurements of exposures or outcomes for study participants, and valid estimates of associations depends on the quality of data. Even though statistical methods have been proposed to adjust for measurement errors, they often rely on unverifiable assumptions and could lead to biased estimates if those assumptions are violated. Therefore, methods for detecting potential 'outlier' evaluators are needed to improve data quality during data collection stage. METHODS: In this paper, we propose a two-stage algorithm to detect 'outlier' evaluators whose evaluation results tend to be higher or lower than their counterparts. In the first stage, evaluators' effects are obtained by fitting a regression model. In the second stage, hypothesis tests are performed to detect 'outlier' evaluators, where we consider both the power of each hypothesis test and the false discovery rate (FDR) among all tests. We conduct an extensive simulation study to evaluate the proposed method, and illustrate the method by detecting potential 'outlier' audiologists in the data collection stage for the Audiology Assessment Arm of the Conservation of Hearing Study, an epidemiologic study for examining risk factors of hearing loss in the Nurses' Health Study II. RESULTS: Our simulation study shows that our method not only can detect true 'outlier' evaluators, but also is less likely to falsely reject true 'normal' evaluators. CONCLUSIONS: Our two-stage 'outlier' detection algorithm is a flexible approach that can effectively detect 'outlier' evaluators, and thus data quality can be improved during data collection stage.


Assuntos
Algoritmos , Confiabilidade dos Dados , Humanos , Simulação por Computador , Coleta de Dados , Fatores de Risco
16.
BMC Med Res Methodol ; 23(1): 125, 2023 05 24.
Artigo em Inglês | MEDLINE | ID: mdl-37226114

RESUMO

BACKGROUND: Cancer registries collect patient-specific information about cancer diseases. The collected information is verified and made available to clinical researchers, physicians, and patients. When processing information, cancer registries verify that the patient-specific records they collect are plausible. This means that the collected information about a particular patient makes medical sense. METHODS: Unsupervised machine learning approaches can detect implausible electronic health records without human guidance. Therefore, this article investigates two unsupervised anomaly detection approaches, a pattern-based approach (FindFPOF) and a compression-based approach (autoencoder), to identify implausible electronic health records in cancer registries. Unlike most existing work that analyzes synthetic anomalies, we compare the performance of both approaches and a baseline (random selection of records) on a real-world dataset. The dataset contains 21,104 electronic health records of patients with breast, colorectal, and prostate tumors. Each record consists of 16 categorical variables describing the disease, the patient, and the diagnostic procedure. The samples identified by FindFPOF, the autoencoder, and a random selection-a total of 785 different records-are evaluated in a real-world scenario by medical domain experts. RESULTS: Both anomaly detection methods are good at detecting implausible electronic health records. First, domain experts identified [Formula: see text] of 300 randomly selected records as implausible. With FindFPOF and the autoencoder, [Formula: see text] of the proposed 300 records in each sample were implausible. This corresponds to a precision of [Formula: see text] for FindFPOF and the autoencoder. Second, for 300 randomly selected records that were labeled by domain experts, the sensitivity of the autoencoder was [Formula: see text] and the sensitivity of FindFPOF was [Formula: see text]. Both anomaly detection methods had a specificity of [Formula: see text]. Third, FindFPOF and the autoencoder suggested samples with a different distribution of values than the overall dataset. For example, both anomaly detection methods suggested a higher proportion of colorectal records, the tumor localization with the highest percentage of implausible records in a randomly selected sample. CONCLUSIONS: Unsupervised anomaly detection can significantly reduce the manual effort of domain experts to find implausible electronic health records in cancer registries. In our experiments, the manual effort was reduced by a factor of approximately 3.5 compared to evaluating a random sample.


Assuntos
Neoplasias Colorretais , Médicos , Neoplasias da Próstata , Masculino , Humanos , Registros Eletrônicos de Saúde , Sistema de Registros
17.
Transfus Med ; 33(3): 263-267, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-36807938

RESUMO

OBJECTIVES: To investigate if time to initiate a blood transfusion after an informative laboratory test could feasibly be used by the transfusion medicine service as a metric to monitor for transfusion delays. BACKGROUND: Delayed transfusions may result in patient morbidity and mortality, but no standards for timely transfusion have been developed. Information technology tools could be implemented to identify gaps in provision of blood and to recognise areas of improvement. MATERIALS AND METHODS: Data obtained from a children's hospital's data science platform and time from the release of laboratory results to the initiation of transfusions were calculated and weekly medians were used for trend analyses. Outlier events were obtained using locally estimated scatterplot smoothing and generalised extreme studentized deviate test. RESULTS: Overall, the number of outlier events on the timing of transfusions based on patients' haemoglobin level and platelet count were small (n = 1 and n = 0 for 139 weeks, respectively). Investigation of these events for adverse clinical outcomes was non-significant. CONCLUSIONS: Herein, we propose that the trends and outlier events could be further investigated and used to make decisions and implement protocols to improve patient care.


Assuntos
Transfusão de Sangue , Criança , Humanos , Contagem de Plaquetas
18.
BMC Health Serv Res ; 23(1): 23, 2023 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-36627627

RESUMO

BACKGROUND: Institutions or clinicians (units) are often compared according to a performance indicator such as in-hospital mortality. Several approaches have been proposed for the detection of outlying units, whose performance deviates from the overall performance. METHODS: We provide an overview of three approaches commonly used to monitor institutional performances for outlier detection. These are the common-mean model, the 'Normal-Poisson' random effects model and the 'Logistic' random effects model. For the latter we also propose a visualisation technique. The common-mean model assumes that the underlying true performance of all units is equal and that any observed variation between units is due to chance. Even after applying case-mix adjustment, this assumption is often violated due to overdispersion and a post-hoc correction may need to be applied. The random effects models relax this assumption and explicitly allow the true performance to differ between units, thus offering a more flexible approach. We discuss the strengths and weaknesses of each approach and illustrate their application using audit data from England and Wales on Adult Cardiac Surgery (ACS) and Percutaneous Coronary Intervention (PCI). RESULTS: In general, the overdispersion-corrected common-mean model and the random effects approaches produced similar p-values for the detection of outliers. For the ACS dataset (41 hospitals) three outliers were identified in total but only one was identified by all methods above. For the PCI dataset (88 hospitals), seven outliers were identified in total but only two were identified by all methods. The common-mean model uncorrected for overdispersion produced several more outliers. The reason for observing similar p-values for all three approaches could be attributed to the fact that the between-hospital variance was relatively small in both datasets, resulting only in a mild violation of the common-mean assumption; in this situation, the overdispersion correction worked well. CONCLUSION: If the common-mean assumption is likely to hold, all three methods are appropriate to use for outlier detection and their results should be similar. Random effect methods may be the preferred approach when the common-mean assumption is likely to be violated.


Assuntos
Intervenção Coronária Percutânea , Humanos , Hospitais , Risco Ajustado , Modelos Logísticos , Inglaterra
19.
Sensors (Basel) ; 23(3)2023 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-36772479

RESUMO

In the last decade, a large amount of data from vehicle location sensors has been generated due to the massification of GPS systems to track them. This is because these sensors usually include multiple variables such as position, speed, angular position of the vehicle, etc., and, furthermore, they are also usually recorded in very short time intervals. On the other hand, routes are often generated so that they do not correspond to reality, due to artifacts such as buildings, bridges, or sensor failures and where, due to the large amount of data, visual analysis of human expert is unable to detect genuinely anomalous routes. The presence of such abnormalities can lead to faulty sensors being detected which may allow sensor replacement to reliably track the vehicle. However, given the reliability of the available sensors, there are very few examples of such anomalies, which can make it difficult to apply supervised learning techniques. In this work we propose the use of unsupervised deep neural network models based on stacked autoencoders to detect anomalous routes in vehicles within Santiago de Chile. The results show that the proposed model is capable of effectively detecting anomalous paths in real data considering validation given by an expert user, reaching a performance of 82.1% on average. As future work, we propose to incorporate the use of Long Short-Term Memory (LSTM) and attention-based networks in order to improve the detection of anomalous trajectories.

20.
Sensors (Basel) ; 23(18)2023 Sep 14.
Artigo em Inglês | MEDLINE | ID: mdl-37765931

RESUMO

To reduce the risks and challenges faced by frontline workers in confined workspaces, accurate real-time health monitoring of their vital signs is essential for improving safety and productivity and preventing accidents. Machine-learning-based data-driven methods have shown promise in extracting valuable information from complex monitoring data. However, practical industrial settings still struggle with the data collection difficulties and low prediction accuracy of machine learning models due to the complex work environment. To tackle these challenges, a novel approach called a long short-term memory (LSTM)-based deep stacked sequence-to-sequence autoencoder is proposed for predicting the health status of workers in confined spaces. The first step involves implementing a wireless data acquisition system using edge-cloud platforms. Smart wearable devices are used to collect data from multiple sources, like temperature, heart rate, and pressure. These comprehensive data provide insights into the workers' health status within the closed space of a manufacturing factory. Next, a hybrid model combining deep learning and support vector machine (SVM) is constructed for anomaly detection. The LSTM-based deep stacked sequence-to-sequence autoencoder is specifically designed to learn deep discriminative features from the time-series data by reconstructing the input data and thus generating fused deep features. These features are then fed into a one-class SVM, enabling accurate recognition of workers' health status. The effectiveness and superiority of the proposed approach are demonstrated through comparisons with other existing approaches.


Assuntos
Comércio , Dispositivos Eletrônicos Vestíveis , Humanos , Coleta de Dados , Nível de Saúde
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA