Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Sensors (Basel) ; 24(1)2023 Dec 25.
Artigo em Inglês | MEDLINE | ID: mdl-38202979

RESUMO

In order to solve low-quality problems such as data anomalies and missing data in the condition monitoring data of hydropower units, this paper proposes a monitoring data quality enhancement method based on HDBSCAN-WSGAIN-GP, which improves the quality and usability of the condition monitoring data of hydropower units by combining the advantages of density clustering and a generative adversarial network. First, the monitoring data are grouped according to the density level by the HDBSCAN clustering method in combination with the working conditions, and the anomalies in this dataset are detected, recognized adaptively and cleaned. Further combining the superiority of the WSGAIN-GP model in data filling, the missing values in the cleaned data are automatically generated by the unsupervised learning of the features and the distribution of real monitoring data. The validation analysis is carried out by the online monitoring dataset of the actual operating units, and the comparison experiments show that the clustering contour coefficient (SCI) of the HDBSCAN-based anomaly detection model reaches 0.4935, which is higher than that of the other comparative models, indicating that the proposed model has superiority in distinguishing between the valid samples and anomalous samples. The probability density distribution of the data filling model based on WSGAIN-GP is similar to that of the measured data, and the KL dispersion, JS dispersion and Hellinger's distance of the distribution between the filled data and the original data are close to 0. Compared with the filling methods such as SGAIN, GAIN, KNN, etc., the effect of data filling with different missing rates is verified, and the RMSE error of data filling with WSGAIN-GP is lower than that of other comparative models. The WSGAIN-GP method has the lowest RMSE error under different missing rates, which proves that the proposed filling model has good accuracy and generalization, and the research results in this paper provide a high-quality data basis for the subsequent trend prediction and state warning.

2.
Microsc Microanal ; 28(1): 109-122, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35177136

RESUMO

Hierarchical density-based spatial clustering of applications with noise (HDBSCAN) and uniform manifold approximation and projection (UMAP), two new state-of-the-art algorithms for clustering analysis, and dimensionality reduction, respectively, are proposed for the segmentation of core-loss electron energy loss spectroscopy (EELS) spectrum images. The performances of UMAP and HDBSCAN are systematically compared to the other clustering analysis approaches used in EELS in the literature using a known synthetic dataset. Better results are found for these new approaches. Furthermore, UMAP and HDBSCAN are showcased in a real experimental dataset from a core­shell nanoparticle of iron and manganese oxides, as well as the triple combination nonnegative matrix factorization­UMAP­HDBSCAN. The results obtained indicate how the complementary use of different combinations may be beneficial in a real-case scenario to attain a complete picture, as different algorithms highlight different aspects of the dataset studied.

3.
Sensors (Basel) ; 21(10)2021 May 13.
Artigo em Inglês | MEDLINE | ID: mdl-34068403

RESUMO

High-resolution automotive radar sensors play an increasing role in detection, classification and tracking of moving objects in traffic scenes. Clustering is frequently used to group detection points in this context. However, this is a particularly challenging task due to variations in number and density of available data points across different scans. Modified versions of the density-based clustering method DBSCAN have mostly been used so far, while hierarchical approaches are rarely considered. In this article, we explore the applicability of HDBSCAN, a hierarchical DBSCAN variant, for clustering radar measurements. To improve results achieved by its unsupervised version, we propose the use of cluster-level constraints based on aggregated background information from cluster candidates. Further, we propose the application of a distance threshold to avoid selection of small clusters at low hierarchy levels. Based on exemplary traffic scenes from nuScenes, a publicly available autonomous driving data set, we test our constraint-based approach along with other methods, including label-based semi-supervised HDBSCAN. Our experiments demonstrate that cluster-level constraints help to adjust HDBSCAN to the given application context and can therefore achieve considerably better results than the unsupervised method. However, the approach requires carefully selected constraint criteria that can be difficult to choose in constantly changing environments.

4.
Sensors (Basel) ; 20(18)2020 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-32927672

RESUMO

In systems connected to smart grids, smart meters with fast and efficient responses are very helpful in detecting anomalies in realtime. However, sending data with a frequency of a minute or less is not normal with today's technology because of the bottleneck of the communication network and storage media. Because mitigation cannot be done in realtime, we propose prediction techniques using Deep Neural Network (DNN), Support Vector Regression (SVR), and k-Nearest Neighbors (KNN). In addition to these techniques, the prediction timestep is chosen per day and wrapped in sliding windows, and clustering using Kmeans and intersection Kmeans and HDBSCAN is also evaluated. The predictive ability applied here is to predict whether anomalies in electricity usage will occur in the next few weeks. The aim is to give the user time to check their usage and from the utility side, whether it is necessary to prepare a sufficient supply. We also propose the latency reduction to counter higher latency as in the traditional centralized system by adding layer Edge Meter Data Management System (MDMS) and Cloud-MDMS as the inference and training model. Based on the experiments when running in the Raspberry Pi, the best solution is choosing DNN that has the shortest latency 1.25 ms, 159 kB persistent file size, and at 128 timesteps.

5.
Heliyon ; 10(6): e27938, 2024 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-38510049

RESUMO

The online food delivery service supply chain constitutes a crucial element in achieving sustainable development goals. With its prosperity, an increasing number of takeaway businesses are drawn to this sector. As their numbers rise, issues such as profitability resilience, environmental friendliness, and fulfillment of social responsibility emerge, posing potential disruptions to the service supply chain. Against this backdrop, our endeavor is to mine the sustainability of takeaway businesses using the triple bottom line. We propose a two-stage approach involving the Bayesian best-worst method and a data mining technique to derive the weights of sustainability criteria and the clusters of takeaway businesses. Subsequently, a case study is conducted focusing on takeaway businesses on the Ele.me platform in China. The results highlight economic sustainability as the most valued criterion, followed by social and environmental sustainability. Clustering outcomes reveal four distinct levels of sustainability, with a stronger performance in social sustainability compared to environmental and economic dimensions. Further discussions explore the relationship between sustainability levels, cuisine categories, and business size. Consequently, this study suggests an effective approach for advancing sustainability initiatives within the online food delivery service supply chain.

6.
Mol Inform ; 42(8-9): e2300061, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37212494

RESUMO

Dimensionality reduction (DR) techniques are used for various purposes such as exploratory data analysis. A commonly employed linear DR technique is principal component analysis (PCA), which is one of the most popular methods for DR. Owing to its linear nature, PCA enables the determination of axes in a low-dimensional space and the calculation of corresponding loading vectors. However, PCA cannot necessarily extract important features of non-linearly distributed data. This study presents a technique aimed at aiding the interpretation of data reduced through non-linear DR methods. In the proposed method, non-linear dimensionally reduced data was clustered via a density-based clustering method. Thereafter, the obtained cluster labels were classified by random forest (RF) classifiers. Further, feature importance (FI) of RF classifiers and Spearman's rank correlation coefficients between predictive probabilities to obtained clusters and original feature values were utilized for characterizing the visualized dimensionally reduced data. The results revealed that the proposed method can provide the interpretable FI-based images of the handwritten digits dataset. Moreover, the proposed method was also applied to the polymer dataset. The study found that incorporating signed FI was advantageous in achieving a meaningful interpretation. Furthermore, Gaussian process regression was utilized to produce intuitive FI-based heatmaps on a 2-dimensional space for greater ease of understanding. Additionally, to enhance the interpretability of the obtained clusters, a feature selection technique called Boruta was applied. The Boruta feature selection method worked effectively to interpret the obtained clusters with limited and commonly important features. Additionally, the study suggested that computing FI solely from substructure-based descriptors could further enhance the interpretability of the results. Finally, the automation of the proposed method was investigated, and through maximizing the target score based on the quality of both the DR and clustering, indicative results were automatically obtained for both the handwritten digits and polymer datasets.


Assuntos
Algoritmos , Polímeros , Polímeros/química
7.
Int J Inj Contr Saf Promot ; 30(2): 270-281, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-36608271

RESUMO

Identifying black spots effectively and accurately is a pivotal and challenging task to improve road traffic safety. A novel black spot identification model is proposed by integrating the GIS-based processing with hierarchical density-based spatial clustering of applications with noise. Additionally, the optimal clustering parameters are determined based on an internal validation indicator called the density-based clustering validation index to minimize the impact of subjectivity in parameter selection. The model is validated by collecting 3536 accident data from 1 August to 31 October 2020 in Hangzhou, China, and eventually identifies 39 black spots. The results show that: (1) The number of accidents contained in black spots account for 75% of all accidents, while the length of network in the black spots only account for 23.26% of the total road network length. (2) Compared with the conventional density-based spatial clustering of applications with noise model and K-means model, the proposed model achieves the best performance with more accidents gathered per unit road length. (3) The sample survey with 6 onsite of the identified black spots indicates that the proposed model has high recognition accuracy and recommend these sites for further investigation.


Assuntos
Acidentes de Trânsito , Sistemas de Informação Geográfica , Humanos , Análise Espacial , Análise por Conglomerados , China/epidemiologia
8.
Acta Trop ; 233: 106585, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-35787418

RESUMO

Geometric morphometric analysis was combined with two different unsupervised machine learning algorithms, UMAP and HDBSCAN, to visualize morphological differences in wing shape among and within four Anopheles sibling species (An. atroparvus, An. melanoon, An. maculipennis s.s. and An. daciae sp. inq.) of the Maculipennis complex in Northern Italy. Specifically, we evaluated: (1) wing shape variation among and within species; (2) the consistencies between groups of An. maculipennis s.s. and An. daciae sp. inq. identified based on COI sequences and wing shape variability; and (3) the spatial and temporal distribution of different morphotypes. UMAP detected at least 13 main patterns of variation in wing shape among the four analyzed species and mapped intraspecific morphological variations. The relationship between the most abundant COI haplotypes of An. daciae sp. inq. and shape ordination/variation was not significant. However, morphological variation within haplotypes was reported. HDBSCAN also recognized different clusters of morphotypes within An. daciae sp. inq. (12) and An. maculipennis s.s. (4). All morphotypes shared a similar pattern of variation in the subcostal vein, in the anal vein and in the radio-medial cross-vein of the wing. On the contrary, the marginal part of the wings remained unchanged in all clusters of both species. Any spatial-temporal significant difference was observed in the frequency of the identified morphotypes.  Our study demonstrated that machine learning algorithms are a useful tool combined with geometric morphometrics and suggest to deepen the analysis of inter and intra specific shape variability to evaluate evolutionary constrains related to wing functionality.


Assuntos
Anopheles , Animais , Anopheles/genética , Itália , Aprendizado de Máquina não Supervisionado , Asas de Animais
9.
Integr Biol (Camb) ; 14(8-12): 184-203, 2022 12 30.
Artigo em Inglês | MEDLINE | ID: mdl-36670549

RESUMO

Live cell calcium (Ca2+) imaging is one of the important tools to record cellular activity during in vitro and in vivo preclinical studies. Specially, high-resolution microscopy can provide valuable dynamic information at the single cell level. One of the major challenges in the implementation of such imaging schemes is to extract quantitative information in the presence of significant heterogeneity in Ca2+ responses attained due to variation in structural arrangement and drug distribution. To fill this gap, we propose time-lapse imaging using spinning disk confocal microscopy and machine learning-enabled framework for automated grouping of Ca2+ spiking patterns. Time series analysis is performed to correlate the drug induced cellular responses to self-assembly pattern present in multicellular systems. The framework is designed to reduce the large-scale dynamic responses using uniform manifold approximation and projection (UMAP). In particular, we propose the suitability of hierarchical DBSCAN (HDBSCAN) in view of reduced number of hyperparameters. We find UMAP-assisted HDBSCAN outperforms existing approaches in terms of clustering accuracy in segregation of Ca2+ spiking patterns. One of the novelties includes the application of non-linear dimension reduction in segregation of the Ca2+ transients with statistical similarity. The proposed pipeline for automation was also proved to be a reproducible and fast method with minimal user input. The algorithm was used to quantify the effect of cellular arrangement and stimulus level on collective Ca2+ responses induced by GPCR targeting drug. The analysis revealed a significant increase in subpopulation containing sustained oscillation corresponding to higher packing density. In contrast to traditional measurement of rise time and decay ratio from Ca2+ transients, the proposed pipeline was used to classify the complex patterns with longer duration and cluster-wise model fitting. The two-step process has a potential implication in deciphering biophysical mechanisms underlying the Ca2+ oscillations in context of structural arrangement between cells.


Assuntos
Cálcio , Microscopia Confocal/métodos
10.
Front Cell Neurosci ; 16: 1074304, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36589286

RESUMO

Introduction: Neurotransmitter release at presynaptic active zones (AZs) requires concerted protein interactions within a dense 3D nano-hemisphere. Among the complex protein meshwork the (M)unc-13 family member Unc-13 of Drosophila melanogaster is essential for docking of synaptic vesicles and transmitter release. Methods: We employ minos-mediated integration cassette (MiMIC)-based gene editing using GFSTF (EGFP-FlAsH-StrepII-TEV-3xFlag) to endogenously tag all annotated Drosophila Unc-13 isoforms enabling visualization of endogenous Unc-13 expression within the central and peripheral nervous system. Results and discussion: Electrophysiological characterization using two-electrode voltage clamp (TEVC) reveals that evoked and spontaneous synaptic transmission remain unaffected in unc-13 GFSTF 3rd instar larvae and acute presynaptic homeostatic potentiation (PHP) can be induced at control levels. Furthermore, multi-color structured-illumination shows precise co-localization of Unc-13GFSTF, Bruchpilot, and GluRIIA-receptor subunits within the synaptic mesoscale. Localization microscopy in combination with HDBSCAN algorithms detect Unc-13GFSTF subclusters that move toward the AZ center during PHP with unaltered Unc-13GFSTF protein levels.

11.
Front Comput Neurosci ; 15: 657151, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34234663

RESUMO

Spike sorting is used to classify the spikes (action potentials acquired by physiological electrodes), aiming to identify their respective firing units. Now it has been developed to classify the spikes recorded by multi-electrode arrays (MEAs), with the improvement of micro-electrode technology. However, how to improve classification accuracy and maintain low time complexity simultaneously becomes a difficulty. A fast and accurate spike sorting approach named HTsort is proposed for high-density multi-electrode arrays in this paper. Several improvements have been introduced to the traditional pipeline that is composed of threshold detection and clustering method. First, the divide-and-conquer method is employed to utilize electrode spatial information to achieve pre-clustering. Second, the clustering method HDBSCAN (hierarchical density-based spatial clustering of applications with noise) is used to classify spikes and detect overlapping events (multiple spikes firing simultaneously). Third, the template merging method is used to merge redundant exported templates according to the template similarity and the spatial distribution of electrodes. Finally, the template matching method is used to resolve overlapping events. Our approach is validated on simulation data constructed by ourselves and publicly available data and compared to other state-of-the-art spike sorters. We found that the proposed HTsort has a more favorable trade-off between accuracy and time consumption. Compared with MountainSort and SpykingCircus, the time consumption is reduced by at least 40% when the number of electrodes is 64 and below. Compared with HerdingSpikes, the classification accuracy can typically improve by more than 10%. Meanwhile, HTsort exhibits stronger robustness against background noise than other sorters. Our more sophisticated spike sorter would facilitate neurophysiologists to complete spike sorting more quickly and accurately.

12.
Comput Struct Biotechnol J ; 19: 1302-1311, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33738079

RESUMO

Local 3D-structural differences in homologous proteins contribute to functional diversity observed in a superfamily, but so far received little attention as bioinformatic analysis was usually carried out at the level of amino acid sequences. We have developed Zebra3D - the first-of-its-kind bioinformatic software for systematic analysis of 3D-alignments of protein families using machine learning. The new tool identifies subfamily-specific regions (SSRs) - patterns of local 3D-structure (i.e. single residues, loops, or secondary structure fragments) that are spatially equivalent within families/subfamilies, but are different among them, and thus can be associated with functional diversity and function-related conformational plasticity. Bioinformatic analysis of protein superfamilies by Zebra3D can be used to study 3D-determinants of catalytic activity and specific accommodation of ligands, help to prepare focused libraries for directed evolution or assist development of chimeric enzymes with novel properties by exchange of equivalent regions between homologs, and to characterize plasticity in binding sites. A companion Mustguseal web-server is available to automatically construct a 3D-alignment of functionally diverse proteins, thus reducing the minimal input required to operate Zebra3D to a single PDB code. The Zebra3D + Mustguseal combined approach provides the opportunity to systematically explore the value of SSRs in superfamilies and to use this information for protein design and drug discovery. The software is available open-access at https://biokinet.belozersky.msu.ru/Zebra3D.

13.
Ultramicroscopy ; 200: 28-38, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30822614

RESUMO

Atom probe tomography (APT) has enabled the direct visualization of solute clusters. However one of the main analysis methods used by the APT community, i.e. the maximum separation method, often suffers from subjective parametric selection and limited applicability. To address the need for more robust and versatile analysis tools, a framework based on hierarchical density-based cluster analysis is implemented. Cluster analysis begins with the HDBSCAN algorithm to conservatively segment the datasets into regions containing clusters and a matrix or noise region. The stability of each cluster and the probability that an atom belongs to a cluster are quantified. Each clustered region is further analyzed by the DeBaCl algorithm to separate and refine clusters present in the sub-volumes. Finally, the k-nearest neighbor algorithm may be used to re-assign matrix atoms to clusters, based on their probability values. Four mandatory parameters are required for this cluster analysis approach. However, the selection of an appropriate value for only one of these parameters, i.e. a rough estimate of the minimum cluster size, is essential. The improved performance of the method was evaluated by analyzing four synthetic APT datasets and comparing the outcome with the commonly-used maximum separation method. Codes and data are made available through GitHub.

14.
Protein Sci ; 27(1): 62-75, 2018 01.
Artigo em Inglês | MEDLINE | ID: mdl-28799290

RESUMO

Correlated motion analysis provides a method for understanding communication between and dynamic similarities of biopolymer residues and domains. The typical equal-time correlation matrices-frequently visualized with pseudo-colorings or heat maps-quickly convey large regions of highly correlated motion but hide more subtle similarities of motion. Here we propose a complementary method for visualizing correlations within proteins (or general biopolymers) that quickly conveys intuition about which residues have a similar dynamic behavior. For grouping residues, we use the recently developed non-parametric clustering algorithm HDBSCAN. Although the method we propose here can be used to group residues using correlation as a similarity matrix-the most straightforward and intuitive method-it can also be used to more generally determine groups of residues which have similar dynamic properties. We term these latter groups "Dynamic Domains", as they are based not on spatial closeness but rather closeness in the column space of a correlation matrix. We provide examples of this method across three human proteins of varying size and function-the Nf-Kappa-Beta essential modulator, the clotting promoter Thrombin and the mismatch repair protein (dimer) complex MutS-alpha. Although the examples presented here are from all-atom molecular dynamics simulations, this visualization technique can also be used on correlations matrices built from any ensembles of conformations from experiment or computation.


Assuntos
Algoritmos , Simulação de Dinâmica Molecular , Movimento (Física) , Proteínas/química , Software , Proteínas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA