Búsqueda | BVS CLAP/SMR-OPS/OMS

1.

The influence of habitat alteration is widespread, but the impact of climate cannot continue to be discounted.

Dickie, Melanie; Serrouya, Robert; Becker, Marcus; DeMars, Craig; Noonan, Michael J; Steenweg, Robin; Boutin, Stan; Ford, Adam T.

Glob Chang Biol ; 30(9): e17497, 2024 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-39268672

RESUMEN

In Dickie et al. (2024), we contrasted the effects of climate and habitat alteration on white-tailed deer density, recognizing the role of both these factors. Barnas et al.'s (2024) critique raised concerns about data transformations, model overfitting, and inference methods, but our analysis demonstrates that these criticisms are either unfounded or align with our original conclusions. We reaffirm that while both climate and habitat alteration contribute to deer densities, management decisions cannot ignore the strong role of climate, which is only predicted to increase in coming decades.

Asunto(s)

Cambio Climático , Ciervos , Ecosistema , Animales , Ciervos/fisiología , Densidad de Población , Conservación de los Recursos Naturales

2.

Redefining Health Care Data Interoperability: Empirical Exploration of Large Language Models in Information Exchange.

Yoon, Dukyong; Han, Changho; Kim, Dong Won; Kim, Songsoo; Bae, SungA; Ryu, Jee An; Choi, Yujin.

J Med Internet Res ; 26: e56614, 2024 May 31.

Artículo en Inglés | MEDLINE | ID: mdl-38819879

RESUMEN

BACKGROUND: Efficient data exchange and health care interoperability are impeded by medical records often being in nonstandardized or unstructured natural language format. Advanced language models, such as large language models (LLMs), may help overcome current challenges in information exchange. OBJECTIVE: This study aims to evaluate the capability of LLMs in transforming and transferring health care data to support interoperability. METHODS: Using data from the Medical Information Mart for Intensive Care III and UK Biobank, the study conducted 3 experiments. Experiment 1 assessed the accuracy of transforming structured laboratory results into unstructured format. Experiment 2 explored the conversion of diagnostic codes between the coding frameworks of the ICD-9-CM (International Classification of Diseases, Ninth Revision, Clinical Modification), and Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) using a traditional mapping table and a text-based approach facilitated by the LLM ChatGPT. Experiment 3 focused on extracting targeted information from unstructured records that included comprehensive clinical information (discharge notes). RESULTS: The text-based approach showed a high conversion accuracy in transforming laboratory results (experiment 1) and an enhanced consistency in diagnostic code conversion, particularly for frequently used diagnostic names, compared with the traditional mapping approach (experiment 2). In experiment 3, the LLM showed a positive predictive value of 87.2% in extracting generic drug names. CONCLUSIONS: This study highlighted the potential role of LLMs in significantly improving health care data interoperability, demonstrated by their high accuracy and efficiency in data transformation and exchange. The LLMs hold vast potential for enhancing medical data exchange without complex standardization for medical terms and data structure.

Asunto(s)

Intercambio de Información en Salud , Humanos , Intercambio de Información en Salud/normas , Interoperabilidad de la Información en Salud , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Systematized Nomenclature of Medicine

3.

Quantitative Amplicon Sequencing Is Necessary to Identify Differential Taxa and Correlated Taxa Where Population Sizes Differ.

Epp Schmidt, Dietrich; Maul, Jude E; Yarwood, Stephanie A.

Microb Ecol ; 86(4): 2790-2801, 2023 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-37563275

RESUMEN

High-throughput, multiplexed-amplicon sequencing has become a core tool for understanding environmental microbiomes. As researchers have widely adopted sequencing, many open-source analysis pipelines have been developed to compare microbiomes using compositional analysis frameworks. However, there is increasing evidence that compositional analyses do not provide the information necessary to accurately interpret many community assembly processes. This is especially true when there are large gradients that drive distinct community assembly processes. Recently, sequencing has been combined with Q-PCR (among other sources of total quantitation) to generate "Quantitative Sequencing" (QSeq) data. QSeq more accurately estimates the true abundance of taxa, is a more reliable basis for inferring correlation, and, ultimately, can be more reliably related to environmental data to infer community assembly processes. In this paper, we use a combination of published data sets, synthesis, and empirical modeling to offer guidance for which contexts QSeq is advantageous. As little as 5% variation in total abundance among experimental groups resulted in more accurate inference by QSeq than compositional methods. Compositional methods for differential abundance and correlation unreliably detected patterns in abundance and covariance when there was greater than 20% variation in total abundance among experimental groups. Whether QSeq performs better for beta diversity analysis depends on the question being asked, and the analytic strategy (e.g., what distance metric is being used); for many questions and methods, QSeq and compositional analysis are equivalent for beta diversity analysis. QSeq is especially useful for taxon-specific analysis; QSeq transformation and analysis should be the default for answering taxon-specific questions of amplicon sequence data. Publicly available bioinformatics pipelines should incorporate support for QSeq transformation and analysis.

Asunto(s)

Bacterias , Microbiota , Bacterias/genética , Densidad de Población , Microbiota/genética , Análisis de Secuencia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento/métodos

4.

Effects of Using Different Indirect Techniques on the Calculation of Reference Intervals: Observational Study.

Yang, Dan; Su, Zihan; Mu, Runqing; Diao, Yingying; Zhang, Xin; Liu, Yusi; Wang, Shuo; Wang, Xu; Zhao, Lei; Wang, Hongyi; Zhao, Min.

J Med Internet Res ; 25: e45651, 2023 07 17.

Artículo en Inglés | MEDLINE | ID: mdl-37459170

RESUMEN

BACKGROUND: Reference intervals (RIs) play an important role in clinical decision-making. However, due to the time, labor, and financial costs involved in establishing RIs using direct means, the use of indirect methods, based on big data previously obtained from clinical laboratories, is getting increasing attention. Different indirect techniques combined with different data transformation methods and outlier removal might cause differences in the calculation of RIs. However, there are few systematic evaluations of this. OBJECTIVE: This study used data derived from direct methods as reference standards and evaluated the accuracy of combinations of different data transformation, outlier removal, and indirect techniques in establishing complete blood count (CBC) RIs for large-scale data. METHODS: The CBC data of populations aged ≥18 years undergoing physical examination from January 2010 to December 2011 were retrieved from the First Affiliated Hospital of China Medical University in northern China. After exclusion of repeated individuals, we performed parametric, nonparametric, Hoffmann, Bhattacharya, and truncation points and Kolmogorov-Smirnov distance (kosmic) indirect methods, combined with log or BoxCox transformation, and Reed-Dixon, Tukey, and iterative mean (3SD) outlier removal methods in order to derive the RIs of 8 CBC parameters and compared the results with those directly and previously established. Furthermore, bias ratios (BRs) were calculated to assess which combination of indirect technique, data transformation pattern, and outlier removal method is preferrable. RESULTS: Raw data showed that the degrees of skewness of the white blood cell (WBC) count, platelet (PLT) count, mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), and mean corpuscular volume (MCV) were much more obvious than those of other CBC parameters. After log or BoxCox transformation combined with Tukey or iterative mean (3SD) processing, the distribution types of these data were close to Gaussian distribution. Tukey-based outlier removal yielded the maximum number of outliers. The lower-limit bias of WBC (male), PLT (male), hemoglobin (HGB; male), MCH (male/female), and MCV (female) was greater than that of the corresponding upper limit for more than half of 30 indirect methods. Computational indirect choices of CBC parameters for males and females were inconsistent. The RIs of MCHC established by the direct method for females were narrow. For this, the kosmic method was markedly superior, which contrasted with the RI calculation of CBC parameters with high |BR| qualification rates for males. Among the top 10 methodologies for the WBC count, PLT count, HGB, MCV, and MCHC with a high-BR qualification rate among males, the Bhattacharya, Hoffmann, and parametric methods were superior to the other 2 indirect methods. CONCLUSIONS: Compared to results derived by the direct method, outlier removal methods and indirect techniques markedly influence the final RIs, whereas data transformation has negligible effects, except for obviously skewed data. Specifically, the outlier removal efficiency of Tukey and iterative mean (3SD) methods is almost equivalent. Furthermore, the choice of indirect techniques depends more on the characteristics of the studied analyte itself. This study provides scientific evidence for clinical laboratories to use their previous data sets to establish RIs.

Asunto(s)

Macrodatos , Recuento de Células Sanguíneas , Adolescente , Adulto , Femenino , Humanos , Masculino , China , Recuento de Leucocitos , Valores de Referencia , Toma de Decisiones Clínicas

5.

A Data Transformation Methodology to Create Findable, Accessible, Interoperable, and Reusable Health Data: Software Design, Development, and Evaluation Study.

Sinaci, A Anil; Gencturk, Mert; Teoman, Huseyin Alper; Laleci Erturkmen, Gokce Banu; Alvarez-Romero, Celia; Martinez-Garcia, Alicia; Poblador-Plou, Beatriz; Carmona-Pírez, Jonás; Löbe, Matthias; Parra-Calderon, Carlos Luis.

J Med Internet Res ; 25: e42822, 2023 03 08.

Artículo en Inglés | MEDLINE | ID: mdl-36884270

RESUMEN

BACKGROUND: Sharing health data is challenging because of several technical, ethical, and regulatory issues. The Findable, Accessible, Interoperable, and Reusable (FAIR) guiding principles have been conceptualized to enable data interoperability. Many studies provide implementation guidelines, assessment metrics, and software to achieve FAIR-compliant data, especially for health data sets. Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) is a health data content modeling and exchange standard. OBJECTIVE: Our goal was to devise a new methodology to extract, transform, and load existing health data sets into HL7 FHIR repositories in line with FAIR principles, develop a Data Curation Tool to implement the methodology, and evaluate it on health data sets from 2 different but complementary institutions. We aimed to increase the level of compliance with FAIR principles of existing health data sets through standardization and facilitate health data sharing by eliminating the associated technical barriers. METHODS: Our approach automatically processes the capabilities of a given FHIR end point and directs the user while configuring mappings according to the rules enforced by FHIR profile definitions. Code system mappings can be configured for terminology translations through automatic use of FHIR resources. The validity of the created FHIR resources can be automatically checked, and the software does not allow invalid resources to be persisted. At each stage of our data transformation methodology, we used particular FHIR-based techniques so that the resulting data set could be evaluated as FAIR. We performed a data-centric evaluation of our methodology on health data sets from 2 different institutions. RESULTS: Through an intuitive graphical user interface, users are prompted to configure the mappings into FHIR resource types with respect to the restrictions of selected profiles. Once the mappings are developed, our approach can syntactically and semantically transform existing health data sets into HL7 FHIR without loss of data utility according to our privacy-concerned criteria. In addition to the mapped resource types, behind the scenes, we create additional FHIR resources to satisfy several FAIR criteria. According to the data maturity indicators and evaluation methods of the FAIR Data Maturity Model, we achieved the maximum level (level 5) for being Findable, Accessible, and Interoperable and level 3 for being Reusable. CONCLUSIONS: We developed and extensively evaluated our data transformation approach to unlock the value of existing health data residing in disparate data silos to make them available for sharing according to the FAIR principles. We showed that our method can successfully transform existing health data sets into HL7 FHIR without loss of data utility, and the result is FAIR in terms of the FAIR Data Maturity Model. We support institutional migration to HL7 FHIR, which not only leads to FAIR data sharing but also eases the integration with different research networks.

Asunto(s)

Registros Electrónicos de Salud , Programas Informáticos , Humanos , Diseño de Software , Estándar HL7 , Difusión de la Información

6.

Empirical Comparisons of 12 Meta-analysis Methods for Synthesizing Proportions of Binary Outcomes.

Lin, Lifeng; Xu, Chang; Chu, Haitao.

J Gen Intern Med ; 37(2): 308-317, 2022 02.

Artículo en Inglés | MEDLINE | ID: mdl-34505983

RESUMEN

BACKGROUND: Meta-analysis is increasingly used to synthesize proportions (e.g., disease prevalence). It can be implemented with widely used two-step methods or one-step methods, such as generalized linear mixed models (GLMMs). Existing simulation studies have shown that GLMMs outperform the two-step methods in some settings. It is, however, unclear whether these simulation settings are common in the real world. We aim to compare the real-world performance of various meta-analysis methods for synthesizing proportions. METHODS: We extracted datasets of proportions from the Cochrane Library and applied 12 two-step and one-step methods to each dataset. We used Spearman's ρ and the Bland-Altman plot to assess their results' correlation and agreement. The GLMM with the logit link was chosen as the reference method. We calculated the absolute difference and fold change (ratio of estimates) of the overall proportion estimates produced by each method vs. the reference method. RESULTS: We obtained a total of 43,644 datasets. The various methods generally had high correlations (ρ > 0.9) and agreements. GLMMs had computational issues more frequently than the two-step methods. However, the two-step methods generally produced large absolute differences from the GLMM with the logit link for small total sample sizes (< 50) and crude event rates within 10-20% and 90-95%, and large fold changes for small total event counts (< 10) and low crude event rates (< 20%). CONCLUSIONS: Although different methods produced similar overall proportion estimates in most datasets, one-step methods should be considered in the presence of small total event counts or sample sizes and very low or high event rates.

Asunto(s)

Proyectos de Investigación , Simulación por Computador , Humanos , Modelos Lineales , Tamaño de la Muestra

7.

A Study on the Prediction of Cancer Using Whole-Genome Data and Deep Learning.

Lee, Young-Ji; Park, Jun-Hyung; Lee, Seung-Ho.

Int J Mol Sci ; 23(18)2022 Sep 08.

Artículo en Inglés | MEDLINE | ID: mdl-36142316

RESUMEN

The number of patients diagnosed with cancer continues to increasingly rise, and has nearly doubled in 20 years. Therefore, predicting cancer occurrence has a significant impact on reducing medical costs, and preventing cancer early can increase survival rates. In the data preprocessing step, since individual genome data are used as input data, they are classified as individual genome data. Subsequently, data embedding is performed in character units, so that it can be used in deep learning. In the deep learning network schema, using preprocessed data, a character-based deep learning network learns the correlation between individual feature data and predicts cancer occurrence. To evaluate the objective reliability of the method proposed in this study, various networks published in other studies were compared and evaluated using the TCGA dataset. As a result of comparing various networks published in other studies using the same data, excellent results were obtained in terms of accuracy, sensitivity, and specificity. Thus, the superiority of the effectiveness of deep learning networks in predicting cancer occurrence using individual whole-genome data was demonstrated. From the results of the confusion matrix, the validity of the model for predicting the cancer using an individual's whole-genome data and the deep learning proposed in this study was proven. In addition, the AUC, which is the area under the ROC curve, which judges the efficiency of diagnosis as a performance evaluation index of the model, was found to be 90% or more, good classification results were derived. The objectives of this study were to use individual genome data for 12 cancers as input data to analyze the whole genome pattern, and to not separately use reference genome sequence data of normal individuals. In addition, several mutation types, including SNV, DEL, and INS, were applied.

Asunto(s)

Aprendizaje Profundo , Neoplasias , Humanos , Neoplasias/genética , Curva ROC , Reproducibilidad de los Resultados

8.

Affine Transformation of Negative Values for NMR Metabolomics Using the mrbin R Package.

Klein, Matthias S.

J Proteome Res ; 20(2): 1397-1404, 2021 02 05.

Artículo en Inglés | MEDLINE | ID: mdl-33417772

RESUMEN

Data from untargeted metabolomics studies employing nuclear magnetic resonance (NMR) spectroscopy oftentimes contain negative values. These negative values hamper data processing and analysis algorithms and prevent the use of such data in multiomics integration settings. New methods to deal with such negative values are thus an urgent need in the metabolomics community. This study presents affine transformation of negative values (ATNV), a novel algorithm for replacement of negative values in NMR data sets. ATNV was implemented in the R package mrbin, which features interactive menus for user-friendly application and is available for free for various operating systems within the free R statistical programming language. The novel algorithms were tested on a set of human urinary NMR spectra and were able to successfully identify relevant metabolites.

Asunto(s)

Metabolómica , Programas Informáticos , Algoritmos , Humanos , Imagen por Resonancia Magnética , Espectroscopía de Resonancia Magnética

9.

Development of a FHIR RDF data transformation and validation framework and its evaluation.

Prud'hommeaux, Eric; Collins, Josh; Booth, David; Peterson, Kevin J; Solbrig, Harold R; Jiang, Guoqian.

J Biomed Inform ; 117: 103755, 2021 05.

Artículo en Inglés | MEDLINE | ID: mdl-33781919

RESUMEN

Resource Description Framework (RDF) is one of the three standardized data formats in the HL7 Fast Healthcare Interoperability Resources (FHIR) specification and is being used by healthcare and research organizations to join FHIR and non-FHIR data. However, RDF previously had not been integrated into popular FHIR tooling packages, hindering the adoption of FHIR RDF in the semantic web and other communities. The objective of the study is to develop and evaluate a Java based FHIR RDF data transformation toolkit to facilitate the use and validation of FHIR RDF data. We extended the popular HAPI FHIR tooling to add RDF support, thus enabling FHIR data in XML or JSON to be transformed to or from RDF. We also developed an RDF Shape Expression (ShEx)-based validation framework to verify conformance of FHIR RDF data to the ShEx schemas provided in the FHIR specification for FHIR versions R4 and R5. The effectiveness of ShEx validation was demonstrated by testing it against 2693 FHIR R4 examples and 2197 FHIR R5 examples that are included in the FHIR specification. A total of 5 types of errors including missing properties, unknown element, missing resource Type, invalid attribute value, and unknown resource name in the R5 examples were revealed, demonstrating the value of the ShEx in the quality assurance of the evolving R5 development. This FHIR RDF data transformation and validation framework, based on HAPI and ShEx, is robust and ready for community use in adopting FHIR RDF, improving FHIR data quality, and evolving the FHIR specification.

Asunto(s)

Atención a la Salud , Registros Electrónicos de Salud

10.

Fault Diagnosis of Rotating Machinery Based on Improved Self-Supervised Learning Method and Very Few Labeled Samples.

Wei, Meirong; Liu, Yan; Zhang, Tao; Wang, Ze; Zhu, Jiaming.

Sensors (Basel) ; 22(1)2021 Dec 28.

Artículo en Inglés | MEDLINE | ID: mdl-35009734

RESUMEN

Convolution neural network (CNN)-based fault diagnosis methods have been widely adopted to obtain representative features and used to classify fault modes due to their prominent feature extraction capability. However, a large number of labeled samples are required to support the algorithm of CNNs, and, in the case of a limited amount of labeled samples, this may lead to overfitting. In this article, a novel ResNet-based method is developed to achieve fault diagnoses for machines with very few samples. To be specific, data transformation combinations (DTCs) are designed based on mutual information. It is worth noting that the selected DTC, which can complete the training process of the 1-D ResNet quickly without increasing the amount of training data, can be randomly used for any batch training data. Meanwhile, a self-supervised learning method called 1-D SimCLR is adopted to obtain an effective feature encoder, which can be optimized with very few unlabeled samples. Then, a fault diagnosis model named DTC-SimCLR is constructed by combining the selected data transformation combination, the obtained feature encoder and a fully-connected layer-based classifier. In DTC-SimCLR, the parameters of the feature encoder are fixed, and the classifier is trained with very few labeled samples. Two machine fault datasets from a cutting tooth and a bearing are conducted to evaluate the performance of DTC-SimCLR. Testing results show that DTC-SimCLR has superior performance and diagnostic accuracy with very few samples.

Asunto(s)

Algoritmos , Redes Neurales de la Computación , Aprendizaje Automático Supervisado

11.

Right-Censored Time Series Modeling by Modified Semi-Parametric A-Spline Estimator.

Aydin, Dursun; Ahmed, Syed Ejaz; Yilmaz, Ersin.

Entropy (Basel) ; 23(12)2021 Nov 27.

Artículo en Inglés | MEDLINE | ID: mdl-34945891

RESUMEN

This paper focuses on the adaptive spline (A-spline) fitting of the semiparametric regression model to time series data with right-censored observations. Typically, there are two main problems that need to be solved in such a case: dealing with censored data and obtaining a proper A-spline estimator for the components of the semiparametric model. The first problem is traditionally solved by the synthetic data approach based on the Kaplan-Meier estimator. In practice, although the synthetic data technique is one of the most widely used solutions for right-censored observations, the transformed data's structure is distorted, especially for heavily censored datasets, due to the nature of the approach. In this paper, we introduced a modified semiparametric estimator based on the A-spline approach to overcome data irregularity with minimum information loss and to resolve the second problem described above. In addition, the semiparametric B-spline estimator was used as a benchmark method to gauge the success of the A-spline estimator. To this end, a detailed Monte Carlo simulation study and a real data sample were carried out to evaluate the performance of the proposed estimator and to make a practical comparison.

12.

Combining Inter-Subject Modeling with a Subject-Based Data Transformation to Improve Affect Recognition from EEG Signals.

Arevalillo-Herráez, Miguel; Cobos, Maximo; Roger, Sandra; García-Pineda, Miguel.

Sensors (Basel) ; 19(13)2019 Jul 08.

Artículo en Inglés | MEDLINE | ID: mdl-31288378

RESUMEN

Existing correlations between features extracted from Electroencephalography (EEG) signals and emotional aspects have motivated the development of a diversity of EEG-based affect detection methods. Both intra-subject and inter-subject approaches have been used in this context. Intra-subject approaches generally suffer from the small sample problem, and require the collection of exhaustive data for each new user before the detection system is usable. On the contrary, inter-subject models do not account for the personality and physiological influence of how the individual is feeling and expressing emotions. In this paper, we analyze both modeling approaches, using three public repositories. The results show that the subject's influence on the EEG signals is substantially higher than that of the emotion and hence it is necessary to account for the subject's influence on the EEG signals. To do this, we propose a data transformation that seamlessly integrates individual traits into an inter-subject approach, improving classification results.

Asunto(s)

Electroencefalografía/métodos , Emociones/fisiología , Modelos Biológicos , Procesamiento de Señales Asistido por Computador , Nivel de Alerta/fisiología , Análisis de Datos , Bases de Datos Factuales , Humanos , Máquina de Vectores de Soporte

13.

Optimization of the taxonomic resolution of an indicator taxon for cost-effective ecological monitoring: Perspectives from a heterogeneous tropical coastline.

Vijapure, Tejal; Sukumaran, Soniya.

J Environ Manage ; 247: 474-483, 2019 Oct 01.

Artículo en Inglés | MEDLINE | ID: mdl-31254762

RESUMEN

An important requirement towards formulating appropriate management and conservation measures for biological diversity is to devise efficient and cost-effective monitoring protocols that yield coherent data. Environmental monitoring investigations have been typically based on species level responses of biodiversity to environmental disturbances. Considering that this exercise is cost-intensive and the species identification keys are unavailable for some geographical areas, efforts are now afoot to test the efficacy of supra-specific taxa in resolving distribution patterns of biota, analogous to that of species. This study was aimed at testing the efficacy of Taxonomic Sufficiency (TS), a data reduction technique, in deciphering spatio-temporal variations of macrobenthos in the tropical coastal waters of northwest India. The macrobenthic indicator taxon, Polychaeta, was analyzed at five transects that included two marine protected areas, during the three major seasons. The consistency of spatio-temporal trends of polychaete assemblages, derived from four taxonomic levels and subjected to five types of data transformation was scrutinized. Univariate indices indicated that coarser taxonomic levels except order, maintained the indicative responses spatio-temporally, similar to the species level. Spatial variability was appropriately indicated by all data matrices. Temporal variation was evident only with family data subjected to fourth root or log data transformations. The TS approach succeeded in this tropical ecoregion owing to the consistent and sizable proportion of monotypic polychaete taxon and uniformity in responses of the constituents of higher polychaete taxon. CCA results revealed that a similar set of environmental variables influenced the polychaete distribution at all the taxonomic levels; however, spatial variations detected at species level diminished with reduced taxonomic breadth. Results indicated that meaningful robust data for deriving coastal management initiatives can be achieved cost-effectively by the adoption of TS approach for the ecologically and economically important 2360â¯km long northwest Indian coastline.

Asunto(s)

Biodiversidad , Poliquetos , Animales , Ecología , Monitoreo del Ambiente , India

14.

Algorithms designed for compressed-gene-data transformation among gene banks with different references.

Luo, Qiuming; Guo, Chao; Zhang, Yi Jun; Cai, Ye; Liu, Gang.

BMC Bioinformatics ; 19(1): 230, 2018 06 18.

Artículo en Inglés | MEDLINE | ID: mdl-29914357

RESUMEN

BACKGROUND: With the reduction of gene sequencing cost and demand for emerging technologies such as precision medical treatment and deep learning in genome, it is an era of gene data outbreaks today. How to store, transmit and analyze these data has become a hotspot in the current research. Now the compression algorithm based on reference is widely used due to its high compression ratio. There exists a big problem that the data from different gene banks can't merge directly and share information efficiently, because these data are usually compressed with different references. The traditional workflow is decompression-and-recompression, which is too simple and time-consuming. We should improve it and speed it up. RESULTS: In this paper, we focus on this problem and propose a set of transformation algorithms to cope with it. We will 1) analyze some different compression algorithms to find the similarities and the differences among all of them, 2) come up with a naïve method named TDM for data transformation between difference gene banks and finally 3) optimize former method TDM and propose the method named TPI and the method named TGI. A number of experiment result proved that the three algorithms we proposed are an order of magnitude faster than traditional decompression-and-recompression workflow. CONCLUSIONS: Firstly, the three algorithms we proposed all have good performance in terms of time. Secondly, they have their own different advantages faced with different dataset or situations. TDM and TPI are more suitable for small-scale gene data transformation, while TGI is more suitable for large-scale gene data transformation.

Asunto(s)

Algoritmos , Compresión de Datos/métodos , Genoma Humano , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Bases de Datos Factuales , Humanos , Estándares de Referencia

15.

Assessing Information Transmission in Data Transformations with the Channel Multivariate Entropy Triangle.

Valverde-Albacete, Francisco J; Peláez-Moreno, Carmen.

Entropy (Basel) ; 20(7)2018 Jun 27.

Artículo en Inglés | MEDLINE | ID: mdl-33265588

RESUMEN

Data transformation, e.g., feature transformation and selection, is an integral part of any machine learning procedure. In this paper, we introduce an information-theoretic model and tools to assess the quality of data transformations in machine learning tasks. In an unsupervised fashion, we analyze the transformation of a discrete, multivariate source of information X¯ into a discrete, multivariate sink of information Y¯ related by a distribution PX¯Y¯. The first contribution is a decomposition of the maximal potential entropy of (X¯,Y¯), which we call a balance equation, into its (a) non-transferable, (b) transferable, but not transferred, and (c) transferred parts. Such balance equations can be represented in (de Finetti) entropy diagrams, our second set of contributions. The most important of these, the aggregate channel multivariate entropy triangle, is a visual exploratory tool to assess the effectiveness of multivariate data transformations in transferring information from input to output variables. We also show how these decomposition and balance equations also apply to the entropies of X¯ and Y¯, respectively, and generate entropy triangles for them. As an example, we present the application of these tools to the assessment of information transfer efficiency for Principal Component Analysis and Independent Component Analysis as unsupervised feature transformation and selection procedures in supervised classification tasks.

16.

Modulation of additive and interactive effects by trial history revisited.

Masson, Michael E J; Rabe, Maximilian M; Kliegl, Reinhold.

Mem Cognit ; 45(3): 480-492, 2017 04.

Artículo en Inglés | MEDLINE | ID: mdl-27787683

RESUMEN

Masson and Kliegl (Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 898-914, 2013) reported evidence that the nature of the target stimulus on the previous trial of a lexical decision task modulates the effects of independent variables on the current trial, including additive versus interactive effects of word frequency and stimulus quality. In contrast, recent reanalyses of previously published data from experiments that, unlike the Masson and Kliegl experiments, did not include semantic priming as a factor, found no evidence for modulation of additive effects of frequency and stimulus quality by trial history (Balota, Aschenbrenner, & Yap, Journal of Experimental Psychology: Learning, Memory, and Cognition, 39, 1563-1571, 2013; O'Malley & Besner, Journal of Experimental Psychology: Learning, Memory, and Cognition, 34, 1400-1411, 2013). We report two experiments that included semantic priming as a factor and that attempted to replicate the modulatory effects found by Masson and Kliegl. In neither experiment was additivity of frequency and stimulus quality modulated by trial history, converging with the findings reported by Balota et al. and O'Malley and Besner. Other modulatory influences of trial history, however, were replicated in the new experiments and reflect potential trial-by-trial alterations in decision processes.

Asunto(s)

Psicolingüística , Memoria Implícita/fisiología , Semántica , Adulto , Humanos , Adulto Joven

17.

A novel class of non-Gaussian system performance assessment and controller parameter tuning methods.

Meng, Yi; Zhou, Jinglin; Lei, Furong; Li, Dazi; Liu, Ruichen.

ISA Trans ; 2024 Sep 02.

Artículo en Inglés | MEDLINE | ID: mdl-39271407

RESUMEN

Traditional variance-based control performance assessment (CPA) and controller parameter tuning (CPT) methods tend to ignore non-Gaussian external disturbances. To address this limitation, this study proposes a novel class of CPA and CPT methods for non-Gaussian single-input single-output systems, denoted as data Gaussianization (inverse) transformation methods. The idea of quantile transformation is used to transform the non-Gaussian data with the goal of maximizing mutual information into virtual Gaussian data. In addition, optimal system data for the virtual loop are mapped back to the actual non-Gaussian system using quantile inverse transformation. Furthermore, a CARMA model-based recursive extended least square algorithm and a CARMA model-based least absolute deviation iterative algorithm are used to identify virtual Gaussian and non-Gaussian system process models, respectively, while implementing the CPT. Finally, a unified framework is proposed for the CPA and CPT of a non-Gaussian control system. The simulation results demonstrate that the proposed strategy can provide a consistent benchmark judgment criterion (threshold) for different non-Gaussian noises, and the tuned controller parameters have good performance.

18.

Pretreating and normalizing metabolomics data for statistical analysis.

Sun, Jun; Xia, Yinglin.

Genes Dis ; 11(3): 100979, 2024 May.

Artículo en Inglés | MEDLINE | ID: mdl-38299197

RESUMEN

Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples. Metabolomics is emerging as a powerful tool generally for precision medicine. Particularly, integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease. However, metabolomics data are very complicated. Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis. In this review article, we comprehensively review various methods that are used to preprocess and pretreat metabolomics data, including MS-based data and NMR -based data preprocessing, dealing with zero and/or missing values and detecting outliers, data normalization, data centering and scaling, data transformation. We discuss the advantages and limitations of each method. The choice for a suitable preprocessing method is determined by the biological hypothesis, the characteristics of the data set, and the selected statistical data analysis method. We then provide the perspective of their applications in the microbiome and metabolome research.

19.

A denoising model based on multi-agent reinforcement learning with data transformation for digital tomosynthesis.

Nam, Kibok; Lee, Dahye; Lee, Seungwan.

Phys Med Biol ; 68(12)2023 06 12.

Artículo en Inglés | MEDLINE | ID: mdl-37192630

RESUMEN

Objective. Denoising models based on the supervised learning have been proposed for medical imaging. However, its clinical availability in digital tomosynthesis (DT) imaging is limited due to the necessity of a large amount of training data for providing acceptable image quality and the difficulty in minimizing a loss. Reinforcement learning (RL) can provide the optimal pollicy, which maximizes a reward, with a small amount of training data for implementing a task. In this study, we presented a denoising model based on the multi-agent RL for DT imaging in order to improve the performance of the machine learning-based denoising model.Approach. The proposed multi-agent RL network consisted of shared sub-network, value sub-network with a reward map convolution (RMC) technique and policy sub-network with a convolutional gated recurrent unit (convGRU). Each sub-network was designed for implementing feature extraction, reward calculation and action execution, respectively. The agents of the proposed network were assigned to each image pixel. The wavelet and Anscombe transformations were applied to DT images for delivering precise noise features during network training. The network training was implemented with the DT images obtained from the three-dimensional digital chest phantoms, which were constructed by using clinical CT images. The performance of the proposed denoising model was evaluated in terms of signal-to-noise ratio (SNR), structural similarity (SSIM) and peak signal-to-noise ratio (PSNR).Main results. Comparing the supervised learning, the proposed denoising model improved the SNRs of the output DT images by 20.64% while maintaining the similar SSIMs and PSNRs. In addition, the SNRs of the output DT images with the wavelet and Anscombe transformations were 25.88 and 42.95% higher than that for the supervised learning, respectively.Significance. The denoising model based on the multi-agent RL can provide high-quality DT images, and the proposed method enables the performance improvement of machine learning-based denoising models.

Asunto(s)

Imagen por Resonancia Magnética , Tomografía Computarizada por Rayos X , Radiografía , Imagen por Resonancia Magnética/métodos , Fantasmas de Imagen , Relación Señal-Ruido , Procesamiento de Imagen Asistido por Computador/métodos , Algoritmos

20.

Homogenization of multi-institutional chest x-ray images in various data transformation schemes.

Kim, Hyeongseok; Lee, Seoyoung; Shim, Woo Jung; Choi, Min-Seong; Cho, Seungryong.

J Med Imaging (Bellingham) ; 10(6): 061103, 2023 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-37125408

RESUMEN

Purpose: Although there are several options for improving the generalizability of learned models, a data instance-based approach is desirable when stable data acquisition conditions cannot be guaranteed. Despite the wide use of data transformation methods to reduce data discrepancies between different data domains, detailed analysis for explaining the performance of data transformation methods is lacking. Approach: This study compares several data transformation methods in the tuberculosis detection task with multi-institutional chest x-ray (CXR) data. Five different data transformations, including normalization, standardization with and without lung masking, and multi-frequency-based (MFB) standardization with and without lung masking were implemented. A tuberculosis detection network was trained using a reference dataset, and the data from six other sites were used for the network performance comparison. To analyze data harmonization performance, we extracted radiomic features and calculated the Mahalanobis distance. We visualized the features with a dimensionality reduction technique. Through similar methods, deep features of the trained networks were also analyzed to examine the models' responses to the data from various sites. Results: From various numerical assessments, the MFB standardization with lung masking provided the highest network performance for the non-reference datasets. From the radiomic and deep feature analyses, the features of the multi-site CXRs after MFB with lung masking were found to be well homogenized to the reference data, whereas the others showed limited performance. Conclusions: Conventional normalization and standardization showed suboptimal performance in minimizing feature differences among various sites. Our study emphasizes the strengths of MFB standardization with lung masking in terms of network performance and feature homogenization.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA