Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 66
Filtrar
1.
Sci Rep ; 13(1): 22710, 2023 12 19.
Artículo en Inglés | MEDLINE | ID: mdl-38123604

RESUMEN

Psoriatic arthritis (PsA) is a chronic inflammatory systemic disease whose activity is often assessed using the Disease Activity Score 28 (DAS28-CRP). The present study was designed to investigate the significance of individual components within the score for PsA activity. A cohort of 80 PsA patients (44 women and 36 men, aged 56.3 ± 12 years) with a range of disease activity from remission to moderate was analyzed using unsupervised and supervised methods applied to the DAS28-CRP components. Machine learning-based permutation importance identified tenderness in the metacarpophalangeal joint of the right index finger as the most informative item of the DAS28-CRP for PsA activity staging. This symptom alone allowed a machine learned (random forests) classifier to identify PsA remission with 67% balanced accuracy in new cases. Projection of the DAS28-CRP data onto an emergent self-organizing map of artificial neurons identified outliers, which following augmentation of group sizes by emergent self-organizing maps based generative artificial intelligence (AI) could be defined as subgroups particularly characterized by either tenderness or swelling of specific joints. AI-assisted re-evaluation of the DAS28-CRP for PsA has narrowed the score items to a most relevant symptom, and generative AI has been useful for identifying and characterizing small subgroups of patients whose symptom patterns differ from the majority. These findings represent an important step toward precision medicine that can address outliers.


Asunto(s)
Artritis Psoriásica , Masculino , Humanos , Femenino , Artritis Psoriásica/diagnóstico , Artritis Psoriásica/tratamiento farmacológico , Inteligencia Artificial , Algoritmos , Articulación Metacarpofalángica , Aprendizaje Automático
2.
Sci Rep ; 13(1): 17923, 2023 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-37864001

RESUMEN

Random walks describe stochastic processes characterized by a sequence of unpredictable changes in a random variable with no correlation to past changes. This report describes the random walk component of a clinical sensory test of olfactory performance. The precise definition of this stochastic process allows the establishment of precise diagnostic cut-offs for the identification of olfactory loss. Within the Sniffin`Sticks olfactory test battery, odor discrimination (D) and odor identification (I) are assessed by four- and three-alternative forced-choice designs, respectively. Meanwhile, the odor threshold (T) test integrates a three-alternative forced-choice paradigm within a staircase paradigm with seven turning points. We explored this paradigm through computer simulations and provided a formal description. The odor threshold assessment test consists of two sequential components, the first of which sets the starting point for the second. Both parts can be characterized as biased random walks with significantly different probabilities of moving to higher (11%) or lower (89%) values. The initial odor concentration step for the first phase of the test and the length of the subsequent random walk in the second phase significantly affect the probability of randomly achieving high test scores. Changing the odor concentration from where the starting point determination for the second test part begins has raised the current cut-off for anosmia, represented as T + D + I < 16, from the 87th quantile of random test scores to the 97th quantile. Analogous findings are likely applicable to other sensory tests that use the staircase paradigm characterized as random walk.


Asunto(s)
Trastornos del Olfato , Olfato , Humanos , Umbral Sensorial , Odorantes , Simulación por Computador , Suministros de Energía Eléctrica , Trastornos del Olfato/diagnóstico
3.
Eur J Pain ; 27(7): 787-793, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37222242

Asunto(s)
Dolor , Humanos
4.
Sci Rep ; 13(1): 5470, 2023 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-37016033

RESUMEN

Selecting the k best features is a common task in machine learning. Typically, a few features have high importance, but many have low importance (right-skewed distribution). This report proposes a numerically precise method to address this skewed feature importance distribution in order to reduce a feature set to the informative minimum of items. Computed ABC analysis (cABC) is an item categorization method that aims to identify the most important items by partitioning a set of non-negative numerical items into subsets "A", "B", and "C" such that subset "A" contains the "few important" items based on specific properties of ABC curves defined by their relationship to Lorenz curves. In its recursive form, the cABC analysis can be applied again to subset "A". A generic image dataset and three biomedical datasets (lipidomics and two genomics datasets) with a large number of variables were used to perform the experiments. The experimental results show that the recursive cABC analysis limits the dimensions of the data projection to a minimum where the relevant information is still preserved and directs the feature selection in machine learning to the most important class-relevant information, including filtering feature sets for nonsense variables. Feature sets were reduced to 10% or less of the original variables and still provided accurate classification in data not used for feature selection. cABC analysis, in its recursive variant, provides a computationally precise means of reducing information to a minimum. The minimum is the result of a computation of the number of k most relevant items, rather than a decision to select the k best items from a list. In addition, there are precise criteria for stopping the reduction process. The reduction to the most important features can improve the human understanding of the properties of the data set. The cABC method is implemented in the Python package "cABCanalysis" available at https://pypi.org/project/cABCanalysis/ .

5.
Curr Oncol ; 30(2): 1903-1915, 2023 02 04.
Artículo en Inglés | MEDLINE | ID: mdl-36826109

RESUMEN

BACKGROUND: The International Prognostic Index (IPI) is applied to predict the outcome of chronic lymphocytic leukemia (CLL) with five prognostic factors, including genetic analysis. We investigated whether multiparameter flow cytometry (MPFC) data of CLL samples could predict the outcome by methods of explainable artificial intelligence (XAI). Further, XAI should explain the results based on distinctive cell populations in MPFC dot plots. METHODS: We analyzed MPFC data from the peripheral blood of 157 patients with CLL. The ALPODS XAI algorithm was used to identify cell populations that were predictive of inferior outcomes (death, failure of first-line treatment). The diagnostic ability of each XAI population was evaluated with receiver operating characteristic (ROC) curves. RESULTS: ALPODS defined 17 populations with higher ability than the CLL-IPI to classify clinical outcomes (ROC: area under curve (AUC) 0.95 vs. 0.78). The best single classifier was an XAI population consisting of CD4+ T cells (AUC 0.78; 95% CI 0.70-0.86; p < 0.0001). Patients with low CD4+ T cells had an inferior outcome. The addition of the CD4+ T-cell population enhanced the predictive ability of the CLL-IPI (AUC 0.83; 95% CI 0.77-0.90; p < 0.0001). CONCLUSIONS: The ALPODS XAI algorithm detected highly predictive cell populations in CLL that may be able to refine conventional prognostic scores such as IPI.


Asunto(s)
Leucemia Linfocítica Crónica de Células B , Humanos , Pronóstico , Leucemia Linfocítica Crónica de Células B/tratamiento farmacológico , Inteligencia Artificial , Algoritmos
6.
Cytometry A ; 103(4): 304-312, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-36030398

RESUMEN

Minimal residual disease (MRD) detection is a strong predictor for survival and relapse in acute myeloid leukemia (AML). MRD can be either determined by molecular assessment strategies or via multiparameter flow cytometry. The degree of bone marrow (BM) dilution with peripheral blood (PB) increases with aspiration volume causing consecutive underestimation of the residual AML blast amount. In order to prevent false-negative MRD results, we developed Cinderella, a simple automated method for one-tube simultaneous measurement of hemodilution in BM samples and MRD level. The explainable artificial intelligence (XAI) Cinderella was trained and validated with the digital raw data of a flow cytometric "8-color" AML-MRD antibody panel in 126 BM and 23 PB samples from 35 patients. Cinderella predicted PB dilution with high accordance compared to the results of the Holdrinet formula (Pearson's correlation coefficient r = 0.94, R2  = 0.89, p < 0.001). Unlike conventional neuronal networks Cinderella calculated the distributions of 12 different cell populations that were assigned to true hematopoietic counterparts as a human in the loop (HIL) approach. Besides characteristic BM cells such as myelocytes and myeloid progenitor cells the XAI identified discriminating populations, which were not specific for BM or PB (e.g., T cell/NK cell subpopulations and CD45 negative cells) and considered their frequency differences. Thus, Cinderella represents a HIL-XAI algorithm capable to calculate the degree of hemodilution in BM samples with an AML MRD immunophenotype panel. It is explicable, transparent, and paves a simple way to prevent false negative MRD reports.


Asunto(s)
Médula Ósea , Leucemia Mieloide Aguda , Humanos , Neoplasia Residual/diagnóstico , Inteligencia Artificial , Hemodilución
7.
Int J Mol Sci ; 23(22)2022 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-36430580

RESUMEN

Bayesian inference is ubiquitous in science and widely used in biomedical research such as cell sorting or "omics" approaches, as well as in machine learning (ML), artificial neural networks, and "big data" applications. However, the calculation is not robust in regions of low evidence. In cases where one group has a lower mean but a higher variance than another group, new cases with larger values are implausibly assigned to the group with typically smaller values. An approach for a robust extension of Bayesian inference is proposed that proceeds in two main steps starting from the Bayesian posterior probabilities. First, cases with low evidence are labeled as "uncertain" class membership. The boundary for low probabilities of class assignment (threshold ε) is calculated using a computed ABC analysis as a data-based technique for item categorization. This leaves a number of cases with uncertain classification (p < ε). Second, cases with uncertain class membership are relabeled based on the distance to neighboring classified cases based on Voronoi cells. The approach is demonstrated on biomedical data typically analyzed with Bayesian statistics, such as flow cytometric data sets or biomarkers used in medical diagnostics, where it increased the class assignment accuracy by 1−10% depending on the data set. The proposed extension of the Bayesian inference of class membership can be used to obtain robust and plausible class assignments even for data at the extremes of the distribution and/or for which evidence is weak.


Asunto(s)
Macrodatos , Investigación Biomédica , Teorema de Bayes , Probabilidad , Incertidumbre
8.
Bioengineering (Basel) ; 9(11)2022 Nov 03.
Artículo en Inglés | MEDLINE | ID: mdl-36354555

RESUMEN

"Big omics data" provoke the challenge of extracting meaningful information with clinical benefit. Here, we propose a two-step approach, an initial unsupervised inspection of the structure of the high dimensional data followed by supervised analysis of gene expression levels, to reconstruct the surface patterns on different subtypes of acute myeloid leukemia (AML). First, Bayesian methodology was used, focusing on surface molecules encoded by cluster of differentiation (CD) genes to assess whether AML is a homogeneous group or segregates into clusters. Gene expressions of 390 patient samples measured using microarray technology and 150 samples measured via RNA-Seq were compared. Beyond acute promyelocytic leukemia (APL), a well-known AML subentity, the remaining AML samples were separated into two distinct subgroups. Next, we investigated which CD molecules would best distinguish each AML subgroup against APL, and validated discriminative molecules of both datasets by searching the scientific literature. Surprisingly, a comparison of both omics analyses revealed that CD339 was the only overlapping gene differentially regulated in APL and other AML subtypes. In summary, our two-step approach for gene expression analysis revealed two previously unknown subgroup distinctions in AML based on surface molecule expression, which may guide the differentiation of subentities in a given clinical-diagnostic context.

9.
Pain Rep ; 7(6): e1044, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36348668

RESUMEN

The collection of increasing amounts of data in health care has become relevant for pain therapy and research. This poses problems for analyses with classical approaches, which is why artificial intelligence (AI) and machine learning (ML) methods are being included into pain research. The current literature on AI and ML in the context of pain research was automatically searched and manually curated. Common machine learning methods and pain settings covered were evaluated. Further focus was on the origin of the publication and technical details, such as the included sample sizes of the studies analyzed with ML. Machine learning was identified in 475 publications from 18 countries, with 79% of the studies published since 2019. Most addressed pain conditions included low back pain, musculoskeletal disorders, osteoarthritis, neuropathic pain, and inflammatory pain. Most used ML algorithms included random forests and support vector machines; however, deep learning was used when medical images were involved in the diagnosis of painful conditions. Cohort sizes ranged from 11 to 2,164,872, with a mode at n = 100; however, deep learning required larger data sets often only available from medical images. Artificial intelligence and ML, in particular, are increasingly being applied to pain-related data. This report presents application examples and highlights advantages and limitations, such as the ability to process complex data, sometimes, but not always, at the cost of big data requirements or black-box decisions.

10.
Data Brief ; 43: 108382, 2022 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-35799850

RESUMEN

Three different Flow Cytometry datasets consisting of diagnostic samples of either peripheral blood (pB) or bone marrow (BM) from patients without any sign of bone marrow disease at two different health care centers are provided. In Flow Cytometry, each cell rapidly passes through a laser beam one by one, and two light scatter, and eight surface parameters of more than 100.000 cells are measured per sample of each patient. The technology swiftly characterizes cells of the immune system at the single-cell level based on antigens presented on the cell surface that are targeted by a set of fluorochrome-conjugated antibodies. The first dataset consists of N=14 sample files measured in Marburg and the second dataset of N=44 data files measured in Dresden, of which half are BM samples and half are pB samples. The third dataset contains N=25 healthy bone marrow samples and N=25 leukemia bone marrow samples measured in Marburg. The data has been scaled to log between zero and six and used to identify cell populations that are simultaneously meaningful to the clinician and relevant to the distinction of pB vs BM, and BM vs leukemia. Explainable artificial intelligence methods should distinguish these samples and provide meaningful explanations for the classification without taking more than several hours to compute their results. The data described in this article are available in Mendeley Data [1].

11.
BMC Bioinformatics ; 23(1): 233, 2022 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-35710346

RESUMEN

BACKGROUND: Data transformations are commonly used in bioinformatics data processing in the context of data projection and clustering. The most used Euclidean metric is not scale invariant and therefore occasionally inappropriate for complex, e.g., multimodal distributed variables and may negatively affect the results of cluster analysis. Specifically, the squaring function in the definition of the Euclidean distance as the square root of the sum of squared differences between data points has the consequence that the value 1 implicitly defines a limit for distances within clusters versus distances between (inter-) clusters. METHODS: The Euclidean distances within a standard normal distribution (N(0,1)) follow a N(0,[Formula: see text]) distribution. The EDO-transformation of a variable X is proposed as [Formula: see text] following modeling of the standard deviation s by a mixture of Gaussians and selecting the dominant modes via item categorization. The method was compared in artificial and biomedical datasets with clustering of untransformed data, z-transformed data, and the recently proposed pooled variable scaling. RESULTS: A simulation study and applications to known real data examples showed that the proposed EDO scaling method is generally useful. The clustering results in terms of cluster accuracy, adjusted Rand index and Dunn's index outperformed the classical alternatives. Finally, the EDO transformation was applied to cluster a high-dimensional genomic dataset consisting of gene expression data for multiple samples of breast cancer tissues, and the proposed approach gave better results than classical methods and was compared with pooled variable scaling. CONCLUSIONS: For multivariate procedures of data analysis, it is proposed to use the EDO transformation as a better alternative to the established z-standardization, especially for nontrivially distributed data. The "EDOtrans" R package is available at https://cran.r-project.org/package=EDOtrans .


Asunto(s)
Algoritmos , Biología Computacional , Análisis por Conglomerados , Genómica , Distribución Normal
12.
PLoS One ; 16(8): e0255838, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34352006

RESUMEN

MOTIVATION: The size of today's biomedical data sets pushes computer equipment to its limits, even for seemingly standard analysis tasks such as data projection or clustering. Reducing large biomedical data by downsampling is therefore a common early step in data processing, often performed as random uniform class-proportional downsampling. In this report, we hypothesized that this can be optimized to obtain samples that better reflect the entire data set than those obtained using the current standard method. RESULTS: By repeating the random sampling and comparing the distribution of the drawn sample with the distribution of the original data, it was possible to establish a method for obtaining subsets of data that better reflect the entire data set than taking only the first randomly selected subsample, as is the current standard. Experiments on artificial and real biomedical data sets showed that the reconstruction of the remaining data from the original data set from the downsampled data improved significantly. This was observed with both principal component analysis and autoencoding neural networks. The fidelity was dependent on both the number of cases drawn from the original and the number of samples drawn. CONCLUSIONS: Optimal distribution-preserving class-proportional downsampling yields data subsets that reflect the structure of the entire data better than those obtained with the standard method. By using distributional similarity as the only selection criterion, the proposed method does not in any way affect the results of a later planned analysis.


Asunto(s)
Procesamiento de Señales Asistido por Computador , Redes Neurales de la Computación
13.
Sci Rep ; 11(1): 10595, 2021 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-34012047

RESUMEN

Diminished sense of smell impairs the quality of life but olfactorily disabled people are hardly considered in measures of disability inclusion. We aimed to stratify perceptual characteristics and odors according to the extent to which they are perceived differently with reduced sense of smell, as a possible basis for creating olfactory experiences that are enjoyed in a similar way by subjects with normal or impaired olfactory function. In 146 subjects with normal or reduced olfactory function, perceptual characteristics (edibility, intensity, irritation, temperature, familiarity, hedonics, painfulness) were tested for four sets of 10 different odors each. Data were analyzed with (i) a projection based on principal component analysis and (ii) the training of a machine-learning algorithm in a 1000-fold cross-validated setting to distinguish between olfactory diagnosis based on odor property ratings. Both analytical approaches identified perceived intensity and familiarity with the odor as discriminating characteristics between olfactory diagnoses, while evoked pain sensation and perceived temperature were not discriminating, followed by edibility. Two disjoint sets of odors were identified, i.e., d = 4 "discriminating odors" with respect to olfactory diagnosis, including cis-3-hexenol, methyl salicylate, 1-butanol and cineole, and d = 7 "non-discriminating odors", including benzyl acetate, heptanal, 4-ethyl-octanoic acid, methional, isobutyric acid, 4-decanolide and p-cresol. Different weightings of the perceptual properties of odors with normal or reduced sense of smell indicate possibilities to create sensory experiences such as food, meals or scents that by emphasizing trigeminal perceptions can be enjoyed by both normosmic and hyposmic individuals.


Asunto(s)
Ciencia de los Datos , Odorantes/análisis , Percepción Olfatoria/fisiología , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Quimioinformática , Femenino , Humanos , Masculino , Persona de Mediana Edad , Análisis de Componente Principal , Adulto Joven
14.
MethodsX ; 7: 101093, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33134096

RESUMEN

Projections are conventional methods of dimensionality reduction for information visualization used to transform high-dimensional data into low dimensional space. If the projection method restricts the output space to two dimensions, the result is a scatter plot. The goal of this scatter plot is to visualize the relative relationships between high-dimensional data points that build up distance and density-based structures. However, the Johnson-Lindenstrauss lemma states that the two-dimensional similarities in the scatter plot cannot coercively represent high-dimensional structures. Here, a simplified emergent self-organizing map uses the projected points of such a scatter plot in combination with the dataset in order to compute the generalized U-matrix. The generalized U-matrix defines the visualization of a topographic map depicting the misrepresentations of projected points with regards to a given dimensionality reduction method and the dataset.•The topographic map provides accurate information about the high-dimensional distance and density based structures of high-dimensional data if an appropriate dimensionality reduction method is selected.•The topographic map can uncover the absence of distance-based structures.•The topographic map reveals the number of clusters in a dataset as the number of valleys.

15.
PLoS One ; 15(10): e0238835, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33052923

RESUMEN

One aim of data mining is the identification of interesting structures in data. For better analytical results, the basic properties of an empirical distribution, such as skewness and eventual clipping, i.e. hard limits in value ranges, need to be assessed. Of particular interest is the question of whether the data originate from one process or contain subsets related to different states of the data producing process. Data visualization tools should deliver a clear picture of the univariate probability density distribution (PDF) for each feature. Visualization tools for PDFs typically use kernel density estimates and include both the classical histogram, as well as the modern tools like ridgeline plots, bean plots and violin plots. If density estimation parameters remain in a default setting, conventional methods pose several problems when visualizing the PDF of uniform, multimodal, skewed distributions and distributions with clipped data, For that reason, a new visualization tool called the mirrored density plot (MD plot), which is specifically designed to discover interesting structures in continuous features, is proposed. The MD plot does not require adjusting any parameters of density estimation, which is what may make the use of this plot compelling particularly to non-experts. The visualization tools in question are evaluated against statistical tests with regard to typical challenges of explorative distribution analysis. The results of the evaluation are presented using bimodal Gaussian, skewed distributions and several features with already published PDFs. In an exploratory data analysis of 12 features describing quarterly financial statements, when statistical testing poses a great difficulty, only the MD plots can identify the structure of their PDFs. In sum, the MD plot outperforms the above mentioned methods.


Asunto(s)
Visualización de Datos , Algoritmos , Interpretación Estadística de Datos , Minería de Datos , Humanos , Método de Montecarlo , Distribución Normal , Probabilidad , Programas Informáticos , Procesos Estocásticos
16.
PLoS One ; 15(9): e0239623, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32970758

RESUMEN

MOTIVATION: Calculating the magnitude of treatment effects or of differences between two groups is a common task in quantitative science. Standard effect size measures based on differences, such as the commonly used Cohen's, fail to capture the treatment-related effects on the data if the effects were not reflected by the central tendency. The present work aims at (i) developing a non-parametric alternative to Cohen's d, which (ii) circumvents some of its numerical limitations and (iii) involves obvious changes in the data that do not affect the group means and are therefore not captured by Cohen's d. RESULTS: We propose "Impact" as a novel non-parametric measure of effect size obtained as the sum of two separate components and includes (i) a difference-based effect size measure implemented as the change in the central tendency of the group-specific data normalized to pooled variability and (ii) a data distribution shape-based effect size measure implemented as the difference in probability density of the group-specific data. Results obtained on artificial and empirical data showed that "Impact"is superior to Cohen's d by its additional second component in detecting clearly visible effects not reflected in central tendencies. The proposed effect size measure is invariant to the scaling of the data, reflects changes in the central tendency in cases where differences in the shape of probability distributions between subgroups are negligible, but captures changes in probability distributions as effects and is numerically stable even if the variances of the data set or its subgroups disappear. CONCLUSIONS: The proposed effect size measure shares the ability to observe such an effect with machine learning algorithms. Therefore, the proposed effect size measure is particularly well suited for data science and artificial intelligence-based knowledge discovery from big and heterogeneous data.


Asunto(s)
Inteligencia Artificial , Investigación Conductal/métodos , Bioestadística/métodos , Interpretación Estadística de Datos , Humanos
17.
Data Brief ; 30: 105501, 2020 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-32373681

RESUMEN

The Fundamental Clustering Problems Suite (FCPS) offers a variety of clustering challenges that any algorithm should be able to handle given real-world data. The FCPS consists of datasets with known a priori classifications that are to be reproduced by the algorithm. The datasets are intentionally created to be visualized in two or three dimensions under the hypothesis that objects can be grouped unambiguously by the human eye. Each dataset represents a certain problem that can be solved by known clustering algorithms with varying success. In the R package "Fundamental Clustering Problems Suite" on CRAN, user-defined sample sizes can be drawn for the FCPS. Additionally, the distances of two high-dimensional datasets called Leukemia and Tetragonula are provided here. This collection is useful for investigating the shortcomings of clustering algorithms and the limitations of dimensionality reduction methods in the case of three-dimensional or higher datasets. This article is a simultaneous co-submission with Swarm Intelligence for Self-Organized Clustering [1].

18.
Eur J Anaesthesiol ; 37(3): 235-246, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-32028289

RESUMEN

BACKGROUND: Persistent pain extending beyond 6 months after breast cancer surgery when adjuvant therapies have ended is a recognised phenomenon. The evolution of postsurgery pain is therefore of interest for future patient management in terms of possible prognoses for distinct groups of patients to enable better patient information. OBJECTIVE(S): An analysis aimed to identify subgroups of patients who share similar time courses of postoperative persistent pain. DESIGN: Prospective cohort study. SETTING: Helsinki University Hospital, Finland, between 2006 and 2010. PATIENTS: A total of 763 women treated for breast cancer at the Helsinki University Hospital. INTERVENTIONS: Employing a data science approach in a nonredundant reanalysis of data published previously, pain ratings acquired at 6, 12, 24 and 36 months after breast cancer surgery, were analysed for a group structure of the temporal courses of pain. Unsupervised automated evolutionary (genetic) algorithms were used for patient cluster detection in the pain ratings and for Gaussian mixture modelling of the slopes of the linear relationship between pain ratings and acquisition times. MAIN OUTCOME MEASURES: Clusters or groups of patients sharing patterns in the time courses of pain between 6 and 36 months after breast cancer surgery. RESULTS: Three groups of patients with distinct time courses of pain were identified as the best solutions for both clustering of the pain ratings and multimodal modelling of the slopes of their temporal trends. In two clusters/groups, pain decreased or remained stable and the two approaches suggested/identified similar subgroups representing 80/763 and 86/763 of the patients, respectively, in whom rather high pain levels tended to further increase over time. CONCLUSION: In the majority of patients, pain after breast cancer surgery decreased rapidly and disappeared or the intensity decreased over 3 years. However, in about a tenth of patients, moderate-to-severe pain tended to increase during the 3-year follow-up.


Asunto(s)
Neoplasias de la Mama , Neoplasias de la Mama/cirugía , Ciencia de los Datos , Femenino , Humanos , Mastectomía , Dolor Postoperatorio/diagnóstico , Dolor Postoperatorio/epidemiología , Dolor Postoperatorio/etiología , Estudios Prospectivos
19.
Sci Rep ; 10(1): 648, 2020 01 20.
Artículo en Inglés | MEDLINE | ID: mdl-31959878

RESUMEN

Finding subgroups in biomedical data is a key task in biomedical research and precision medicine. Already one-dimensional data, such as many different readouts from cell experiments, preclinical or human laboratory experiments or clinical signs, often reveal a more complex distribution than a single mode. Gaussian mixtures play an important role in the multimodal distribution of one-dimensional data. However, although fitting of Gaussian mixture models (GMM) is often aimed at obtaining the separate modes composing the mixture, current technical implementations, often using the Expectation Maximization (EM) algorithm, are not optimized for this task. This occasionally results in poorly separated modes that are unsuitable for determining a distinguishable group structure in the data. Here, we introduce "Distribution Optimization" an evolutionary algorithm to GMM fitting that uses an adjustable error function that is based on chi-square statistics and the probability density. The algorithm can be directly targeted at the separation of the modes of the mixture by employing additional criterion for the degree by which single modes overlap. The obtained GMM fits were comparable with those obtained with classical EM based fits, except for data sets where the EM algorithm produced unsatisfactory results with overlapping Gaussian modes. There, the proposed algorithm successfully separated the modes, providing a basis for meaningful group separation while fitting the data satisfactorily. Through its optimization toward mode separation, the evolutionary algorithm proofed particularly suitable basis for group separation in multimodally distributed data, outperforming alternative EM based methods.

20.
Int J Mol Sci ; 21(1)2019 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-31861946

RESUMEN

Advances in flow cytometry enable the acquisition of large and high-dimensional data sets per patient. Novel computational techniques allow the visualization of structures in these data and, finally, the identification of relevant subgroups. Correct data visualizations and projections from the high-dimensional space to the visualization plane require the correct representation of the structures in the data. This work shows that frequently used techniques are unreliable in this respect. One of the most important methods for data projection in this area is the t-distributed stochastic neighbor embedding (t-SNE). We analyzed its performance on artificial and real biomedical data sets. t-SNE introduced a cluster structure for homogeneously distributed data that did not contain any subgroup structure. In other data sets, t-SNE occasionally suggested the wrong number of subgroups or projected data points belonging to different subgroups, as if belonging to the same subgroup. As an alternative approach, emergent self-organizing maps (ESOM) were used in combination with U-matrix methods. This approach allowed the correct identification of homogeneous data while in sets containing distance or density-based subgroups structures; the number of subgroups and data point assignments were correctly displayed. The results highlight possible pitfalls in the use of a currently widely applied algorithmic technique for the detection of subgroups in high dimensional cytometric data and suggest a robust alternative.


Asunto(s)
Biología Computacional/métodos , Citometría de Flujo/métodos , Aprendizaje Automático , Algoritmos , Antígenos CD/análisis , Conjuntos de Datos como Asunto , Humanos , Procesos Estocásticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...