Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
1.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38168839

RESUMO

Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis.


Assuntos
Análise de Célula Única , Transcriptoma , Análise de Sequência de RNA/métodos , Reprodutibilidade dos Testes , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos
2.
Molecules ; 29(16)2024 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-39202980

RESUMO

This study conducts an in-depth analysis of clustering small molecules using spectral geometry and deep learning techniques. We applied a spectral geometric approach to convert molecular structures into triangulated meshes and used the Laplace-Beltrami operator to derive significant geometric features. By examining the eigenvectors of these operators, we captured the intrinsic geometric properties of the molecules, aiding their classification and clustering. The research utilized four deep learning methods: Deep Belief Network, Convolutional Autoencoder, Variational Autoencoder, and Adversarial Autoencoder, each paired with k-means clustering at different cluster sizes. Clustering quality was evaluated using the Calinski-Harabasz and Davies-Bouldin indices, Silhouette Score, and standard deviation. Nonparametric tests were used to assess the impact of topological descriptors on clustering outcomes. Our results show that the DBN + k-means combination is the most effective, particularly at lower cluster counts, demonstrating significant sensitivity to structural variations. This study highlights the potential of integrating spectral geometry with deep learning for precise and efficient molecular clustering.

3.
Sensors (Basel) ; 23(22)2023 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-38005415

RESUMO

Vessel detection and tracking is of utmost importance to river traffic. Efficient detection and tracking technology offer an effective solution to address challenges related to river traffic safety and congestion. Traditional image-based object detection and tracking algorithms encounter issues such as target ID switching, difficulties in feature extraction, reduced robustness due to occlusion, target overlap, and changes in brightness and contrast. To detect and track vessels more accurately, a vessel detection and tracking algorithm based on the LiDAR point cloud was proposed. For vessel detection, statistical filtering algorithms were integrated into the Euclidean clustering algorithm to mitigate the effect of ripples on vessel detection. Our detection accuracy of vessels improved by 3.3% to 8.3% compared to three conventional algorithms. For vessel tracking, L-shape fitting of detected vessels can improve the efficiency of tracking, and a simple and efficient tracking algorithm is presented. By comparing three traditional tracking algorithms, an improvement in multiple object tracking accuracy (MOTA) and a reduction in ID switch times and number of missed detections were achieved. The results demonstrate that LiDAR point cloud-based vessel detection can significantly enhance the accuracy of vessel detection and tracking.

4.
Sensors (Basel) ; 23(13)2023 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-37447967

RESUMO

Autonomous vehicles (AVs) rely on advanced sensory systems, such as Light Detection and Ranging (LiDAR), to function seamlessly in intricate and dynamic environments. LiDAR produces highly accurate 3D point clouds, which are vital for the detection, classification, and tracking of multiple targets. A systematic review and classification of various clustering and Multi-Target Tracking (MTT) techniques are necessary due to the inherent challenges posed by LiDAR data, such as density, noise, and varying sampling rates. As part of this study, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology was employed to examine the challenges and advancements in MTT techniques and clustering for LiDAR point clouds within the context of autonomous driving. Searches were conducted in major databases such as IEEE Xplore, ScienceDirect, SpringerLink, ACM Digital Library, and Google Scholar, utilizing customized search strategies. We identified and critically reviewed 76 relevant studies based on rigorous screening and evaluation processes, assessing their methodological quality, data handling adequacy, and reporting compliance. As a result of this comprehensive review and classification, we were able to provide a detailed overview of current challenges, research gaps, and advancements in clustering and MTT techniques for LiDAR point clouds, thus contributing to the field of autonomous driving. Researchers and practitioners working in the field of autonomous driving will benefit from this study, which was characterized by transparency and reproducibility on a systematic basis.


Assuntos
Veículos Autônomos , Lacunas de Evidências , Reprodutibilidade dos Testes , Análise por Conglomerados , Bases de Dados Factuais
5.
Int J Mol Sci ; 24(20)2023 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-37895035

RESUMO

The genetic architecture of ischemic stroke (IS), which is one of the leading causes of death worldwide, is complex and underexplored. The traditional approach for associative gene mapping is genome-wide association studies (GWASs), testing individual single-nucleotide polymorphisms (SNPs) across the genomes of case and control groups. The purpose of this research is to develop an alternative approach in which groups of SNPs are examined rather than individual ones. We proposed, validated and applied to real data a new workflow consisting of three key stages: grouping SNPs in clusters, inferring the haplotypes in the clusters and testing haplotypes for the association with phenotype. To group SNPs, we applied the clustering algorithms DBSCAN and HDBSCAN to linkage disequilibrium (LD) matrices, representing pairwise r2 values between all genotyped SNPs. These clustering algorithms have never before been applied to genotype data as part of the workflow of associative studies. In total, 883,908 SNPs and insertion/deletion polymorphisms from people of European ancestry (4929 cases and 652 controls) were processed. The subsequent testing for frequencies of haplotypes restored in the clusters of SNPs revealed dozens of genes associated with IS and suggested the complex role that protocadherin molecules play in IS. The developed workflow was validated with the use of a simulated dataset of similar ancestry and the same sample sizes. The results of classic GWASs are also provided and discussed. The considered clustering algorithms can be applied to genotypic data to identify the genomic loci associated with different qualitative traits, using the workflow presented in this research.


Assuntos
Estudo de Associação Genômica Ampla , AVC Isquêmico , Humanos , Estudo de Associação Genômica Ampla/métodos , Genótipo , Desequilíbrio de Ligação , Haplótipos/genética , Polimorfismo de Nucleotídeo Único , Genômica , Análise por Conglomerados
6.
Sensors (Basel) ; 22(18)2022 Sep 08.
Artigo em Inglês | MEDLINE | ID: mdl-36146136

RESUMO

Using implicit responses to determine consumers' response to different stimuli is becoming a popular approach, but research is still needed to understand the outputs of the different technologies used to collect data. During the present research, electroencephalography (EEG) responses and self-reported liking and emotions were collected on different stimuli (odor, taste, flavor samples) to better understand sweetness perception. Artificial intelligence analytics were used to classify the implicit responses, identifying decision trees to discriminate the stimuli by activated sensory system (odor/taste/flavor) and by nature of the stimuli ('sweet' vs. 'non-sweet' odors; 'sweet-taste', 'sweet-flavor', and 'non-sweet flavor'; and 'sweet stimuli' vs. 'non-sweet stimuli'). Significant differences were found among self-reported-liking of the stimuli and the emotions elicited by the stimuli, but no clear relationship was identified between explicit and implicit data. The present research sums interesting data for the EEG-linked research as well as for EEG data analysis, although much is still unknown about how to properly exploit implicit measurement technologies and their data.


Assuntos
Odorantes , Paladar , Inteligência Artificial , Árvores de Decisões , Eletroencefalografia , Humanos , Odorantes/análise , Percepção , Paladar/fisiologia
7.
Cytometry A ; 99(2): 133-144, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33476090

RESUMO

Automated clustering workflows are increasingly used for the analysis of high parameter flow cytometry data. This trend calls for algorithms which are able to quickly process tens of millions of data points, to compare results across subjects or time points, and to provide easily actionable interpretations of the results. To this end, we created Tailor, a model-based clustering algorithm specialized for flow cytometry data. Our approach leverages a phenotype-aware binning scheme to provide a coarse model of the data, which is then refined using a multivariate Gaussian mixture model. We benchmark Tailor using a simulation study and two flow cytometry data sets, and show that the results are robust to moderate departures from normality and inter-sample variation. Moreover, Tailor provides automated, non-overlapping annotations of its clusters, which facilitates interpretation of results and downstream analysis. Tailor is released as an R package, and the source code is publicly available at www.github.com/matei-ionita/Tailor.


Assuntos
Algoritmos , Software , Análise por Conglomerados , Citometria de Fluxo , Humanos , Distribuição Normal
8.
Sensors (Basel) ; 21(16)2021 Aug 10.
Artigo em Inglês | MEDLINE | ID: mdl-34450837

RESUMO

The synergy between Artificial Intelligence and the Edge Computing paradigm promises to transfer decision-making processes to the periphery of sensor networks without the involvement of central data servers. For this reason, we recently witnessed an impetuous development of devices that integrate sensors and computing resources in a single board to process data directly on the collection place. Due to the particular context where they are used, the main feature of these boards is the reduced energy consumption, even if they do not exhibit absolute computing powers comparable to modern high-end CPUs. Among the most popular Artificial Intelligence techniques, clustering algorithms are practical tools for discovering correlations or affinities within data collected in large datasets, but a parallel implementation is an essential requirement because of their high computational cost. Therefore, in the present work, we investigate how to implement clustering algorithms on parallel and low-energy devices for edge computing environments. In particular, we present the experiments related to two devices with different features: the quad-core UDOO X86 Advanced+ board and the GPU-based NVIDIA Jetson Nano board, evaluating them from the performance and the energy consumption points of view. The experiments show that they realize a more favorable trade-off between these two requirements than other high-end computing devices.


Assuntos
Algoritmos , Inteligência Artificial , Análise por Conglomerados
9.
Entropy (Basel) ; 23(8)2021 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-34441111

RESUMO

This paper is devoted to the foundational problems of dendrogramic holographic theory (DH theory). We used the ontic-epistemic (implicate-explicate order) methodology. The epistemic counterpart is based on the representation of data by dendrograms constructed with hierarchic clustering algorithms. The ontic universe is described as a p-adic tree; it is zero-dimensional, totally disconnected, disordered, and bounded (in p-adic ultrametric spaces). Classical-quantum interrelations lose their sharpness; generally, simple dendrograms are "more quantum" than complex ones. We used the CHSH inequality as a measure of quantum-likeness. We demonstrate that it can be violated by classical experimental data represented by dendrograms. The seed of this violation is neither nonlocality nor a rejection of realism, but the nonergodicity of dendrogramic time series. Generally, the violation of ergodicity is one of the basic features of DH theory. The dendrogramic representation leads to the local realistic model that violates the CHSH inequality. We also considered DH theory for Minkowski geometry and monitored the dependence of CHSH violation and nonergodicity on geometry, as well as a Lorentz transformation of data.

10.
Entropy (Basel) ; 23(4)2021 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-33920374

RESUMO

Traditional information retrieval systems return a ranked list of results to a user's query. This list is often long, and the user cannot explore all the results retrieved. It is also ineffective for a highly ambiguous language such as Arabic. The modern writing style of Arabic excludes the diacritical marking, without which Arabic words become ambiguous. For a search query, the user has to skim over the document to infer if the word has the same meaning they are after, which is a time-consuming task. It is hoped that clustering the retrieved documents will collate documents into clear and meaningful groups. In this paper, we use an enhanced k-means clustering algorithm, which yields a faster clustering time than the regular k-means. The algorithm uses the distance calculated from previous iterations to minimize the number of distance calculations. We propose a system to cluster Arabic search results using the enhanced k-means algorithm, labeling each cluster with the most frequent word in the cluster. This system will help Arabic web users identify each cluster's topic and go directly to the required cluster. Experimentally, the enhanced k-means algorithm reduced the execution time by 60% for the stemmed dataset and 47% for the non-stemmed dataset when compared to the regular k-means, while slightly improving the purity.

11.
Sensors (Basel) ; 20(21)2020 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-33172017

RESUMO

Internet of Things (IoT) is becoming a new socioeconomic revolution in which data and immediacy are the main ingredients. IoT generates large datasets on a daily basis but it is currently considered as "dark data", i.e., data generated but never analyzed. The efficient analysis of this data is mandatory to create intelligent applications for the next generation of IoT applications that benefits society. Artificial Intelligence (AI) techniques are very well suited to identifying hidden patterns and correlations in this data deluge. In particular, clustering algorithms are of the utmost importance for performing exploratory data analysis to identify a set (a.k.a., cluster) of similar objects. Clustering algorithms are computationally heavy workloads and require to be executed on high-performance computing clusters, especially to deal with large datasets. This execution on HPC infrastructures is an energy hungry procedure with additional issues, such as high-latency communications or privacy. Edge computing is a paradigm to enable light-weight computations at the edge of the network that has been proposed recently to solve these issues. In this paper, we provide an in-depth analysis of emergent edge computing architectures that include low-power Graphics Processing Units (GPUs) to speed-up these workloads. Our analysis includes performance and power consumption figures of the latest Nvidia's AGX Xavier to compare the energy-performance ratio of these low-cost platforms with a high-performance cloud-based counterpart version. Three different clustering algorithms (i.e., k-means, Fuzzy Minimals (FM), and Fuzzy C-Means (FCM)) are designed to be optimally executed on edge and cloud platforms, showing a speed-up factor of up to 11× for the GPU code compared to sequential counterpart versions in the edge platforms and energy savings of up to 150% between the edge computing and HPC platforms.

12.
Behav Res Methods ; 52(2): 503-520, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31037607

RESUMO

In practical applications of knowledge space theory, knowledge states can be conceived as partially ordered clusters of individuals. Existing extensions of the theory to polytomous data lack methods for building "polytomous" structures. To this aim, an adaptation of the k-median clustering algorithm is proposed. It is an extension of k-modes to ordinal data in which the Hamming distance is replaced by the Manhattan distance, and the central tendency measure is the median, rather than the mode. The algorithm is tested in a series of simulation studies and in an application to empirical data. Results show that there are theoretical and practical reasons for preferring the k-median to the k-modes algorithm, whenever the responses to the items are measured on an ordinal scale. This is because the Manhattan distance is sensitive to the order on the levels, while the Hamming distance is not. Overall, k-median seems to be a promising data-driven procedure for building polytomous structures.


Assuntos
Algoritmos , Análise por Conglomerados , Humanos , Conhecimento
13.
BMC Bioinformatics ; 20(Suppl 15): 503, 2019 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-31874625

RESUMO

BACKGROUND: Cluster analysis is a core task in modern data-centric computation. Algorithmic choice is driven by factors such as data size and heterogeneity, the similarity measures employed, and the type of clusters sought. Familiarity and mere preference often play a significant role as well. Comparisons between clustering algorithms tend to focus on cluster quality. Such comparisons are complicated by the fact that algorithms often have multiple settings that can affect the clusters produced. Such a setting may represent, for example, a preset variable, a parameter of interest, or various sorts of initial assignments. A question of interest then is this: to what degree do the clusters produced vary as setting values change? RESULTS: This work introduces a new metric, termed simply "robustness", designed to answer that question. Robustness is an easily-interpretable measure of the propensity of a clustering algorithm to maintain output coherence over a range of settings. The robustness of eleven popular clustering algorithms is evaluated over some two dozen publicly available mRNA expression microarray datasets. Given their straightforwardness and predictability, hierarchical methods generally exhibited the highest robustness on most datasets. Of the more complex strategies, the paraclique algorithm yielded consistently higher robustness than other algorithms tested, approaching and even surpassing hierarchical methods on several datasets. Other techniques exhibited mixed robustness, with no clear distinction between them. CONCLUSIONS: Robustness provides a simple and intuitive measure of the stability and predictability of a clustering algorithm. It can be a useful tool to aid both in algorithm selection and in deciding how much effort to devote to parameter tuning.


Assuntos
Algoritmos , Biometria , Análise por Conglomerados , Perfilação da Expressão Gênica
14.
BMC Genomics ; 20(1): 637, 2019 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-31390979

RESUMO

BACKGROUND: The detection of protein complexes is of great significance for researching mechanisms underlying complex diseases and developing new drugs. Thus, various computational algorithms have been proposed for protein complex detection. However, most of these methods are based on only topological information and are sensitive to the reliability of interactions. As a result, their performance is affected by false-positive interactions in PPINs. Moreover, these methods consider only density and modularity and ignore protein complexes with various densities and modularities. RESULTS: To address these challenges, we propose an algorithm to exploit protein complexes in PPINs by a Seed-Extended algorithm based on Density and Modularity with Topological structure and GO annotations, named SE-DMTG to improve the accuracy of protein complex detection. First, we use common neighbors and GO annotations to construct a weighted PPIN. Second, we define a new seed selection strategy to select seed nodes. Third, we design a new fitness function to detect protein complexes with various densities and modularities. We compare the performance of SE-DMTG with that of thirteen state-of-the-art algorithms on several real datasets. CONCLUSION: The experimental results show that SE-DMTG not only outperforms some classical algorithms in yeast PPINs in terms of the F-measure and Jaccard but also achieves an ideal performance in terms of functional enrichment. Furthermore, we apply SE-DMTG to PPINs of several other species and demonstrate the outstanding accuracy and matching ratio in detecting protein complexes compared with other algorithms.


Assuntos
Algoritmos , Biologia Computacional/métodos , Anotação de Sequência Molecular , Mapeamento de Interação de Proteínas , Animais , Análise por Conglomerados , Reações Falso-Positivas , Humanos , Camundongos , Aprendizado de Máquina Supervisionado
15.
Sensors (Basel) ; 18(4)2018 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-29673230

RESUMO

This paper presents a method of fusing the ego-motion of a robot or a land vehicle estimated from an upward-facing camera with Global Navigation Satellite System (GNSS) signals for navigation purposes in urban environments. A sky-pointing camera is mounted on the top of a car and synchronized with a GNSS receiver. The advantages of this configuration are two-fold: firstly, for the GNSS signals, the upward-facing camera will be used to classify the acquired images into sky and non-sky (also known as segmentation). A satellite falling into the non-sky areas (e.g., buildings, trees) will be rejected and not considered for the final position solution computation. Secondly, the sky-pointing camera (with a field of view of about 90 degrees) is helpful for urban area ego-motion estimation in the sense that it does not see most of the moving objects (e.g., pedestrians, cars) and thus is able to estimate the ego-motion with fewer outliers than is typical with a forward-facing camera. The GNSS and visual information systems are tightly-coupled in a Kalman filter for the final position solution. Experimental results demonstrate the ability of the system to provide satisfactory navigation solutions and better accuracy than the GNSS-only and the loosely-coupled GNSS/vision, 20 percent and 82 percent (in the worst case) respectively, in a deep urban canyon, even in conditions with fewer than four GNSS satellites.

16.
Hum Brain Mapp ; 38(6): 2808-2818, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28294456

RESUMO

Based on cytoarchitecture, the posterior cingulate cortex (PCC) is thought to be comprised of two distinct functional subregions: the dorsal and ventral PCC (dPCC and vPCC). However, functional subregions do not completely match anatomical boundaries in the human brain. To understand the relationship between the functional organization of regions and anatomical features, it is necessary to apply parcellation algorithms based on functional properties. We therefore defined functionally informed subregions in the human PCC by parcellation of regions with similar patterns of functional connectivity in the resting brain. We used various patterns of functional connectivity, namely local, whole-brain and diffuse functional connections of the PCC, and various clustering methods, namely hierarchical, spectral, and k-means clustering to investigate the subregions of the PCC. Overall, the approximate anatomical boundaries and predicted functional regions were highly overlapped to each other. Using hierarchical clustering, the PCC could be clearly separated into two anatomical subregions, namely the dPCC and vPCC, and further divided into four subregions segregated by local functional connectivity patterns. We show that the PCC could be separated into two (dPCC and vPCC) or four subregions based on local functional connections and hierarchical clustering, and that subregions of PCC display differential global functional connectivity, particularly along the dorsal-ventral axis. These results suggest that differences in functional connectivity between dPCC and vPCC may be due to differences in local connectivity between these functionally hierarchical subregions of the PCC. Hum Brain Mapp 38:2808-2818, 2017. © 2017 Wiley Periodicals, Inc.


Assuntos
Mapeamento Encefálico , Giro do Cíngulo/anatomia & histologia , Giro do Cíngulo/diagnóstico por imagem , Processamento de Imagem Assistida por Computador , Imageamento por Ressonância Magnética , Vias Neurais/diagnóstico por imagem , Algoritmos , Análise por Conglomerados , Feminino , Lateralidade Funcional/fisiologia , Giro do Cíngulo/fisiologia , Humanos , Masculino , Vias Neurais/anatomia & histologia , Vias Neurais/fisiologia , Adulto Jovem
17.
Cytometry A ; 91(3): 232-249, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-28160404

RESUMO

We developed a fully automated procedure for analyzing data from LED pulses and multilevel bead sets to evaluate backgrounds and photoelectron scales of cytometer fluorescence channels. The method improves on previous formulations by fitting a full quadratic model with appropriate weighting and by providing standard errors and peak residuals as well as the fitted parameters themselves. Here we describe the details of the methods and procedures involved and present a set of illustrations and test cases that demonstrate the consistency and reliability of the results. The automated analysis and fitting procedure is generally quite successful in providing good estimates of the Spe (statistical photoelectron) scales and backgrounds for all the fluorescence channels on instruments with good linearity. The precision of the results obtained from LED data is almost always better than that from multilevel bead data, but the bead procedure is easy to carry out and provides results good enough for most purposes. Including standard errors on the fitted parameters is important for understanding the uncertainty in the values of interest. The weighted residuals give information about how well the data fits the model, and particularly high residuals indicate bad data points. Known photoelectron scales and measurement channel backgrounds make it possible to estimate the precision of measurements at different signal levels and the effects of compensated spectral overlap on measurement quality. Combining this information with measurements of standard samples carrying dyes of biological interest, we can make accurate comparisons of dye sensitivity among different instruments. Our method is freely available through the R/Bioconductor package flowQB. © 2017 International Society for Advancement of Cytometry.


Assuntos
Citometria de Fluxo/métodos , Modelos Teóricos , Imagem Óptica/métodos , Calibragem , Citometria de Fluxo/estatística & dados numéricos , Análise dos Mínimos Quadrados
18.
Mol Ecol ; 25(9): 1944-57, 2016 May.
Artigo em Inglês | MEDLINE | ID: mdl-26915049

RESUMO

Accurate estimates of biodiversity are required for research in a broad array of biological subdisciplines including ecology, evolution, systematics, conservation and biodiversity science. The use of statistical models and genetic data, particularly DNA barcoding, has been suggested as an important tool for remedying the large gaps in our current understanding of biodiversity. However, the reliability of biodiversity estimates obtained using these approaches depends on how well the statistical models that are used describe the evolutionary process underlying the genetic data. In this study, we utilize data from the Barcode of Life Database and posterior predictive simulations to assess the performance of DNA barcoding under commonly used substitution models. We demonstrate that the success of DNA barcoding varies widely across DNA substitution models and that model choice has a substantial impact on the number of operational taxonomic units identified (changing results by ~4-31%). Additionally, we demonstrate that the widely followed practice of a priori assuming the Kimura 2-parameter model for DNA barcoding is statistically unjustified and should be avoided. Using both data-based and inference-based test statistics, we detect variation in model performance across taxonomic groups, clustering algorithms, genetic divergence thresholds and substitution models. Taken together, these results illustrate the importance of considering both model selection and model adequacy in studies quantifying biodiversity.


Assuntos
Simulação por Computador , Código de Barras de DNA Taxonômico/métodos , Algoritmos , Teorema de Bayes , Biodiversidade , Análise por Conglomerados , Modelos Genéticos , Modelos Estatísticos
19.
Physiol Mol Biol Plants ; 22(1): 163-74, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-27186030

RESUMO

As an extended gamut of integral membrane (extrinsic) proteins, and based on their transporting specificities, P-type ATPases include five subfamilies in Arabidopsis, inter alia, P4ATPases (phospholipid-transporting ATPase), P3AATPases (plasma membrane H(+) pumps), P2A and P2BATPases (Ca(2+) pumps) and P1B ATPases (heavy metal pumps). Although, many different computational methods have been developed to predict substrate specificity of unknown proteins, further investigation needs to improve the efficiency and performance of the predicators. In this study, various attribute weighting and supervised clustering algorithms were employed to identify the main amino acid composition attributes, which can influence the substrate specificity of ATPase pumps, classify protein pumps and predict the substrate specificity of uncharacterized ATPase pumps. The results of this study indicate that both non-reduced coefficients pertaining to absorption and Cys extinction within 280 nm, the frequencies of hydrogen, Ala, Val, carbon, hydrophilic residues, the counts of Val, Asn, Ser, Arg, Phe, Tyr, hydrophilic residues, Phe-Phe, Ala-Ile, Phe-Leu, Val-Ala and length are specified as the most important amino acid attributes through applying the whole attribute weighting models. Here, learning algorithms engineered in a predictive machine (Naive Bays) is proposed to foresee the Q9LVV1 and O22180 substrate specificities (P-type ATPase like proteins) with 100 % prediction confidence. For the first time, our analysis demonstrated promising application of bioinformatics algorithms in classifying ATPases pumps. Moreover, we suggest the predictive systems that can assist towards the prediction of the substrate specificity of any new ATPase pumps with the maximum possible prediction confidence.

20.
J Clin Epidemiol ; 172: 111435, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38901709

RESUMO

OBJECTIVES: To examine the impact of two key choices when conducting a network analysis (clustering methods and measure of association) on the number and type of multimorbidity clusters. STUDY DESIGN AND SETTING: Using cross-sectional self-reported data on 24 diseases from 30,097 community-living adults aged 45-85 from the Canadian Longitudinal Study on Aging, we conducted network analyses using 5 clustering methods and 11 association measures commonly used in multimorbidity studies. We compared the similarity among clusters using the adjusted Rand index (ARI); an ARI of 0 is equivalent to the diseases being randomly assigned to clusters, and 1 indicates perfect agreement. We compared the network analysis results to disease clusters independently identified by two clinicians. RESULTS: Results differed greatly across combinations of association measures and cluster algorithms. The number of clusters identified ranged from 1 to 24, with a low similarity of conditions within clusters. Compared to clinician-derived clusters, ARIs ranged from -0.02 to 0.24, indicating little similarity. CONCLUSION: These analyses demonstrate the need for a systematic evaluation of the performance of network analysis methods on binary clustered data like diseases. Moreover, in individual older adults, diseases may not cluster predictably, highlighting the need for a personalized approach to their care.


Assuntos
Multimorbidade , Humanos , Idoso , Canadá/epidemiologia , Estudos Longitudinais , Análise por Conglomerados , Feminino , Idoso de 80 Anos ou mais , Masculino , Estudos Transversais , Pessoa de Meia-Idade , Envelhecimento , Algoritmos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA