Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 360
Filtrar
Más filtros

Tipo del documento
Intervalo de año de publicación
1.
BMC Med Res Methodol ; 24(1): 221, 2024 Sep 27.
Artículo en Inglés | MEDLINE | ID: mdl-39333904

RESUMEN

Diabetes is thought to be the most common illness in underdeveloped nations. Early detection and competent medical care are crucial steps in reducing the effects of diabetes. Examining the signs associated with diabetes is one of the most effective ways to identify the condition. The problem of missing data is not very well investigated in existing works. In addition, existing studies on diabetes detection lack accuracy and robustness. The available datasets frequently contain missing information for the automated detection of diabetes, which might negatively impact machine learning model performance. This work suggests an automated diabetes prediction method that achieves high accuracy and effectively manages missing variables in order to address this problem. The proposed strategy employs a stacked ensemble voting classifier model with three machine learning models. and a KNN Imputer to handle missing values. Using the KNN imputer, the suggested model performs exceptionally well, with accuracy, precision, recall, F1 score, and MCC of 98.59%, 99.26%, 99.75%, 99.45%, and 99.24%, respectively. In two scenarios one with missing values eliminated and the other with KNN imputer, the study thoroughly compared the suggested model with seven other machine learning techniques. The outcomes demonstrate the superiority of the suggested model over current state-of-the-art methods and confirm its efficacy. This work demonstrates the capability of KNN imputer and looks at the problem of missing values for diabetes detection. Medical professionals can utilize the results to improve care for diabetes patients and discover problems early.


Asunto(s)
Algoritmos , Minería de Datos , Diabetes Mellitus , Aprendizaje Automático , Humanos , Minería de Datos/métodos , Minería de Datos/estadística & datos numéricos , Diabetes Mellitus/diagnóstico , Femenino , Masculino , Persona de Mediana Edad , Adulto
2.
Comput Math Methods Med ; 2022: 9339905, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35103072

RESUMEN

Due to the increasing prosperity of human life science and technology, many huge research results have been obtained, and the scientific research of molecular biology is developing rapidly. Therefore, the output of biological genome data has increased exponentially, which constitutes a huge amount of data analysis. The seemingly chaotic and massive amount of data information actually contains a large amount of data and information of great key scientific significance and value. Therefore, this kind of genomic data information not only contains the information content that describes the characteristics of human life but also contains the information content that can express the essence of the biological organism. It includes macroeconomic information that can reflect the basic structure and capabilities of living organisms and microinformation in related fields of molecular biology. This massive amount of genetic data is usually closely related to each other, can influence each other, and does not exist alone. In the article, the causes of uncertain data and the classification of uncertain data are introduced, and the basic concepts and related algorithms of data mining are explained. Focusing on the research and analysis of abnormal point detection and clustering algorithms in uncertain data mining technology, this paper solves the problem of how to obtain more diverse and accurate outlier detection and cluster analysis results in uncertain data. The results showed that whether it was related to obesity or not, the Lp(a) level of the sarcopenia group was significantly higher than that of the nonsarcopenia group. At the same time, the correlation analysis showed that ASM/height was negatively correlated with Lp(a). ASM/height is one of the criteria for diagnosing sarcoidosis, and it is also the core of the analysis. Among the 1956 tumor patients collected in this study, 432 had sarcopenia, accounting for 22.08%, and the incidence of sarcopenia in patients with gastrointestinal tumors increased.


Asunto(s)
Minería de Datos/métodos , Ejercicio Físico/fisiología , Sarcopenia/etiología , Anciano , Anciano de 80 o más Años , Algoritmos , Biología Computacional , Minería de Datos/estadística & datos numéricos , Ejercicio Físico/estadística & datos numéricos , Femenino , Fuerza de la Mano/fisiología , Humanos , Lipoproteína(a)/sangre , Modelos Logísticos , Masculino , Persona de Mediana Edad , Modelos Biológicos , Músculo Esquelético/fisiología , Neoplasias/complicaciones , Neoplasias/fisiopatología , Sarcopenia/diagnóstico , Sarcopenia/fisiopatología
3.
Comput Math Methods Med ; 2022: 5115089, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35198037

RESUMEN

Studies have shown that the physical, psychological, and social problems of liver cancer patients are more serious than those of other cancer patients and their quality of life is significantly reduced. This may be related to the poor treatment effect of patients with advanced liver cancer. Patients often have adverse symptoms such as cancer pain, pleural effusion, and ascites, etc., which have a great impact on patients' psychology and recovery from illness. With the change of the medical model, it has become history to rely solely on drugs to care for patients with advanced liver cancer and comprehensive nursing intervention has become very important. Continuous nursing intervention focuses on individualized and full-hearted care, effectively alleviating patients' anxiety and fear and improving patients' environmental adaptability and psychological defense mechanisms. However, in the field of liver cancer, there is no detailed comparison between the efficacy of continuous nursing and traditional conventional nursing. This article applies the hidden Markov model, starts with medical data mining, and describes the process achieved by the application of this article and the analysis of the results obtained by the two nursing methods, which reflect the difference in curative effect evaluation, and it proves that continuous nursing has more advantages in the curative effect of patients with liver tumors.


Asunto(s)
Minería de Datos/métodos , Neoplasias Hepáticas/enfermería , Modelos de Enfermería , Algoritmos , China , Biología Computacional , Minería de Datos/estadística & datos numéricos , Humanos , Cadenas de Markov
4.
Comput Math Methods Med ; 2022: 9288452, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35154361

RESUMEN

One of the leading causes of deaths around the globe is heart disease. Heart is an organ that is responsible for the supply of blood to each part of the body. Coronary artery disease (CAD) and chronic heart failure (CHF) often lead to heart attack. Traditional medical procedures (angiography) for the diagnosis of heart disease have higher cost as well as serious health concerns. Therefore, researchers have developed various automated diagnostic systems based on machine learning (ML) and data mining techniques. ML-based automated diagnostic systems provide an affordable, efficient, and reliable solutions for heart disease detection. Various ML, data mining methods, and data modalities have been utilized in the past. Many previous review papers have presented systematic reviews based on one type of data modality. This study, therefore, targets systematic review of automated diagnosis for heart disease prediction based on different types of modalities, i.e., clinical feature-based data modality, images, and ECG. Moreover, this paper critically evaluates the previous methods and presents the limitations in these methods. Finally, the article provides some future research directions in the domain of automated heart disease detection based on machine learning and multiple of data modalities.


Asunto(s)
Diagnóstico por Computador/métodos , Insuficiencia Cardíaca/diagnóstico , Aprendizaje Automático , Algoritmos , Arritmias Cardíacas/diagnóstico , Arritmias Cardíacas/diagnóstico por imagen , Biología Computacional , Enfermedad de la Arteria Coronaria/diagnóstico , Enfermedad de la Arteria Coronaria/diagnóstico por imagen , Minería de Datos/estadística & datos numéricos , Bases de Datos Factuales/estadística & datos numéricos , Diagnóstico por Computador/estadística & datos numéricos , Diagnóstico por Computador/tendencias , Electrocardiografía/estadística & datos numéricos , Insuficiencia Cardíaca/diagnóstico por imagen , Humanos , Interpretación de Imagen Asistida por Computador/estadística & datos numéricos , Aprendizaje Automático/tendencias , Redes Neurales de la Computación
5.
Comput Math Methods Med ; 2022: 6503402, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35178118

RESUMEN

The selection of MOOC teaching resources is influenced by diversified resource positioning methods, which leads to low index efficiency of resource mining. Therefore, this paper proposes a multiresource mining method based on association rules to collect the learning behavior data of MOOC users and establish the MOOC teaching resource warehouse. Aiming at the attribute set of information association positioning, the association rules of teaching resources are designed. In addition, the association rules are combined with the shortest path scheduling scheme of teaching resources to establish the location and mining of diversified MOOC teaching-associated resources. Finally, the clustering method is used to process the results of teaching resource mining and complete the clustering of diversified teaching resources. Experimental results show that the index time required by the proposed mining method is 0.1 s, which is only 1/6 of other resource mining methods.


Asunto(s)
Instrucción por Computador/métodos , Minería de Datos/métodos , Educación a Distancia/métodos , Algoritmos , Aprendizaje por Asociación , China , Biología Computacional , Simulación por Computador , Instrucción por Computador/estadística & datos numéricos , Minería de Datos/estadística & datos numéricos , Educación a Distancia/estadística & datos numéricos , Humanos , Internet , Lenguaje , Modelos Educacionales , Programas Informáticos
6.
Am J Emerg Med ; 53: 285.e1-285.e5, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-34602329

RESUMEN

STUDY OBJECTIVES: COVID-19 brought unique challenges; however, it remains unclear what effect the pandemic had on violence in healthcare. The objective of this study was to identify the impact of the pandemic on workplace violence at an academic emergency department (ED). METHODS: This mixed-methods study involved a prospective descriptive survey study and electronic medical record review. Within our hospital referral region (HRR), the first COVID-19 case was documented on 3/11/2020 and cases peaked in mid-November 2020. We compared the monthly HRR COVID-19 case rate per 100,000 people to the rate of violent incidents per 1000 ED visits. Multidisciplinary ED staff were surveyed both pre/early-pandemic (April 2020) and mid/late-pandemic (December 2020) regarding workplace violence experienced over the prior 6-months. The study was deemed exempt by the Mayo Clinic Institutional Review Board. RESULTS: There was a positive association between the monthly HRR COVID-19 case rate and rate of violent ED incidents (r = 0.24). Violent incidents increased overall during the pandemic (2.53 incidents per 1000 visits) compared to the 3 months prior (1.13 incidents per 1000 visits, p < .001), as well as compared to the previous year (1.24 incidents per 1000 patient visits, p < .001). Survey respondents indicated a higher incidence of assault during the pandemic, compared to before (p = .019). DISCUSSION: Incidents of workplace violence at our ED increased during the pandemic and there was a positive association of these incidents with the COVID-19 case rate. Our findings indicate health systems should prioritize employee safety during future pandemics.


Asunto(s)
COVID-19/psicología , Servicio de Urgencia en Hospital/estadística & datos numéricos , Violencia Laboral/estadística & datos numéricos , Centros Médicos Académicos/organización & administración , Centros Médicos Académicos/estadística & datos numéricos , Adulto , COVID-19/prevención & control , COVID-19/transmisión , Distribución de Chi-Cuadrado , Víctimas de Crimen/rehabilitación , Minería de Datos/estadística & datos numéricos , Servicio de Urgencia en Hospital/organización & administración , Femenino , Personal de Salud/psicología , Personal de Salud/estadística & datos numéricos , Humanos , Masculino , Persona de Mediana Edad , Estudios Prospectivos , Encuestas y Cuestionarios , Violencia Laboral/tendencias
7.
Nucleic Acids Res ; 50(D1): D222-D230, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850920

RESUMEN

MicroRNAs (miRNAs) are noncoding RNAs with 18-26 nucleotides; they pair with target mRNAs to regulate gene expression and produce significant changes in various physiological and pathological processes. In recent years, the interaction between miRNAs and their target genes has become one of the mainstream directions for drug development. As a large-scale biological database that mainly provides miRNA-target interactions (MTIs) verified by biological experiments, miRTarBase has undergone five revisions and enhancements. The database has accumulated >2 200 449 verified MTIs from 13 389 manually curated articles and CLIP-seq data. An optimized scoring system is adopted to enhance this update's critical recognition of MTI-related articles and corresponding disease information. In addition, single-nucleotide polymorphisms and disease-related variants related to the binding efficiency of miRNA and target were characterized in miRNAs and gene 3' untranslated regions. miRNA expression profiles across extracellular vesicles, blood and different tissues, including exosomal miRNAs and tissue-specific miRNAs, were integrated to explore miRNA functions and biomarkers. For the user interface, we have classified attributes, including RNA expression, specific interaction, protein expression and biological function, for various validation experiments related to the role of miRNA. We also used seed sequence information to evaluate the binding sites of miRNA. In summary, these enhancements render miRTarBase as one of the most research-amicable MTI databases that contain comprehensive and experimentally verified annotations. The newly updated version of miRTarBase is now available at https://miRTarBase.cuhk.edu.cn/.


Asunto(s)
Regiones no Traducidas 3' , Bases de Datos de Ácidos Nucleicos , Redes Reguladoras de Genes , MicroARNs/genética , Neoplasias/genética , ARN no Traducido/genética , Animales , Sitios de Unión , Biomarcadores/metabolismo , Minería de Datos/estadística & datos numéricos , Exosomas/química , Exosomas/metabolismo , Regulación de la Expresión Génica , Humanos , Internet , Ratones , MicroARNs/clasificación , MicroARNs/metabolismo , Anotación de Secuencia Molecular , Neoplasias/metabolismo , Neoplasias/patología , Polimorfismo de Nucleótido Simple , ARN no Traducido/clasificación , ARN no Traducido/metabolismo , Células Tumorales Cultivadas , Interfaz Usuario-Computador
8.
Comput Math Methods Med ; 2021: 6323357, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34887940

RESUMEN

The current article paper is aimed at assessing and comparing the seasonal check-in behavior of individuals in Shanghai, China, using location-based social network (LBSN) data and a variety of spatiotemporal analytic techniques. The article demonstrates the uses of location-based social network's data by analyzing the trends in check-ins throughout a three-year term for health purpose. We obtained the geolocation data from Sina Weibo, one of the biggest renowned Chinese microblogs (Weibo). The composed data is converted to geographic information system (GIS) type and assessed using temporal statistical analysis and spatial statistical analysis using kernel density estimation (KDE) assessment. We have applied various algorithms and trained machine learning models and finally satisfied with sequential model results because the accuracy we got was leading amongst others. The location cataloguing is accomplished via the use of facts about the characteristics of physical places. The findings demonstrate that visitors' spatial operations are more intense than residents' spatial operations, notably in downtown. However, locals also visited outlying regions, and tourists' temporal behaviors vary significantly while citizens' movements exhibit a more steady stable behavior. These findings may be used in destination management, metro planning, and the creation of digital cities.


Asunto(s)
Macrodatos , Minería de Datos/estadística & datos numéricos , Aprendizaje Automático/estadística & datos numéricos , Medios de Comunicación Sociales/estadística & datos numéricos , Viaje/estadística & datos numéricos , China , Ciudades , Biología Computacional , Árboles de Decisión , Sistemas de Información Geográfica , Humanos , Estaciones del Año , Red Social , Análisis Espacio-Temporal
9.
Comput Math Methods Med ; 2021: 2059432, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34819987

RESUMEN

Traditional audit data analysis algorithms have many shortcomings, such as the lack of means to mine the hidden audit clues behind the data, the difficulty of finding increasingly hidden cheating techniques caused by the electronic and networked environment, and the inability to solve the quality defects of the audited data. Correlation analysis algorithm in data mining technology is an effective means to obtain knowledge from massive data, which can complete, muffle, clean, and reduce defective data and then can analyze massive data and obtain audit trails under the guidance of expert experience or analysts. Therefore, on the basis of summarizing and analyzing previous research works, this paper expounds the research status and significance of audit data analysis and application; elaborates the development background, current status, and future challenges of correlation analysis algorithm; introduces the methods and principles of data model and its conversion and audit model construction; conducts audit data collection and cleaning; implements audit data preprocessing and its algorithm description; performs audit data analysis based on correlation analysis algorithm; analyzes the hidden node activation value and audit rule extraction in correlation analysis algorithm; proposes the application of audit data based on correlation analysis algorithm; discusses the relationship between audit data quality and audit risk; and finally compares different data mining algorithms in audit data analysis. The findings demonstrate that by analyzing association rules, the correlation analysis algorithm can determine the significance of a huge quantity of audit data and characterise the degree to which linked events would occur concurrently or sequentially in a probabilistic manner. The correlation analysis algorithm first inputs the collected audit data through preprocessing module to filter out useless data and then organizes the obtained data into a format that can be recognized by data mining algorithm and executes the correlation analysis algorithm on the sorted data; finally, the obtained hidden data is divided into normal data and suspicious data by comparing it with the pattern in the rule base. The algorithm can conduct in-depth analysis and research on the company's accounting vouchers, account books, and a large number of financial accounting data and other data of various natures in the company's accounting vouchers; reveal its original characteristics and internal connections; and turn it into an audit. People need more direct and useful information. The study results of this paper provide a reference for further researches on audit data analysis and application based on correlation analysis algorithm.


Asunto(s)
Algoritmos , Macrodatos , Análisis de Datos , Auditoría Financiera/métodos , Biología Computacional , Correlación de Datos , Minería de Datos/métodos , Minería de Datos/estadística & datos numéricos , Auditoría Financiera/estadística & datos numéricos , Humanos
10.
Comput Math Methods Med ; 2021: 7690902, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34812270

RESUMEN

The intelligent diagnosis of cervical cancer by using a class of data mining algorithms has important practical significance. In particular, the useful information included in a significant quantity of medical data may not only discreetly boost the development of medical technology but also detect cervical cancer in the future. This paper improves the data mining algorithm and combines image recognition technology and data mining technology to extract and analyze image features. Moreover, this paper makes full use of the information contained in the image to realize the segmentation of the cervical cancer cell image, select the feature vector according to the characteristics of the cervical cancer cell, and use the statistical classification method to design the classifier. The test results show that the automatic recognition effect of this system is good, and it has a good auxiliary diagnosis effect. Therefore, it can be verified in clinical practice in the follow-up.


Asunto(s)
Algoritmos , Minería de Datos/estadística & datos numéricos , Diagnóstico por Computador/estadística & datos numéricos , Neoplasias del Cuello Uterino/diagnóstico , Biología Computacional , Femenino , Humanos , Interpretación de Imagen Asistida por Computador/estadística & datos numéricos , Modelos Logísticos , Neoplasias del Cuello Uterino/diagnóstico por imagen
11.
Comput Math Methods Med ; 2021: 7937573, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34795792

RESUMEN

Semantic mining is always a challenge for big biomedical text data. Ontology has been widely proved and used to extract semantic information. However, the process of ontology-based semantic similarity calculation is so complex that it cannot measure the similarity for big text data. To solve this problem, we propose a parallelized semantic similarity measurement method based on Hadoop MapReduce for big text data. At first, we preprocess and extract the semantic features from documents. Then, we calculate the document semantic similarity based on ontology network structure under MapReduce framework. Finally, based on the generated semantic document similarity, document clusters are generated via clustering algorithms. To validate the effectiveness, we use two kinds of open datasets. The experimental results show that the traditional methods can hardly work for more than ten thousand biomedical documents. The proposed method keeps efficient and accurate for big dataset and is of high parallelism and scalability.


Asunto(s)
Macrodatos , Análisis por Conglomerados , Minería de Datos/métodos , Semántica , Algoritmos , Ontologías Biológicas/estadística & datos numéricos , Biología Computacional , Minería de Datos/estadística & datos numéricos , Documentación/métodos , Documentación/estadística & datos numéricos , Humanos , MEDLINE/estadística & datos numéricos , Aprendizaje Automático
12.
Mol Syst Biol ; 17(10): e10387, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34664389

RESUMEN

We need to effectively combine the knowledge from surging literature with complex datasets to propose mechanistic models of SARS-CoV-2 infection, improving data interpretation and predicting key targets of intervention. Here, we describe a large-scale community effort to build an open access, interoperable and computable repository of COVID-19 molecular mechanisms. The COVID-19 Disease Map (C19DMap) is a graphical, interactive representation of disease-relevant molecular mechanisms linking many knowledge sources. Notably, it is a computational resource for graph-based analyses and disease modelling. To this end, we established a framework of tools, platforms and guidelines necessary for a multifaceted community of biocurators, domain experts, bioinformaticians and computational biologists. The diagrams of the C19DMap, curated from the literature, are integrated with relevant interaction and text mining databases. We demonstrate the application of network analysis and modelling approaches by concrete examples to highlight new testable hypotheses. This framework helps to find signatures of SARS-CoV-2 predisposition, treatment response or prioritisation of drug candidates. Such an approach may help deal with new waves of COVID-19 or similar pandemics in the long-term perspective.


Asunto(s)
COVID-19/inmunología , Biología Computacional/métodos , Bases de Datos Factuales , SARS-CoV-2/inmunología , Programas Informáticos , Antivirales/uso terapéutico , COVID-19/genética , COVID-19/virología , Gráficos por Computador , Citocinas/genética , Citocinas/inmunología , Minería de Datos/estadística & datos numéricos , Regulación de la Expresión Génica , Interacciones Microbiota-Huesped/genética , Interacciones Microbiota-Huesped/inmunología , Humanos , Inmunidad Celular/efectos de los fármacos , Inmunidad Humoral/efectos de los fármacos , Inmunidad Innata/efectos de los fármacos , Linfocitos/efectos de los fármacos , Linfocitos/inmunología , Linfocitos/virología , Redes y Vías Metabólicas/genética , Redes y Vías Metabólicas/inmunología , Células Mieloides/efectos de los fármacos , Células Mieloides/inmunología , Células Mieloides/virología , Mapeo de Interacción de Proteínas , SARS-CoV-2/efectos de los fármacos , SARS-CoV-2/genética , SARS-CoV-2/patogenicidad , Transducción de Señal , Factores de Transcripción/genética , Factores de Transcripción/inmunología , Proteínas Virales/genética , Proteínas Virales/inmunología , Tratamiento Farmacológico de COVID-19
13.
Comput Math Methods Med ; 2021: 3854518, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34691237

RESUMEN

There is currently no effective analytical method in colorectal image analysis, which leads to certain errors in colorectal image analysis. In order to improve the accuracy of colorectal imaging detection, this study used a genetic algorithm as the data mining algorithm and combined it with image processing technology to perform image analysis. At the same time, combined with the actual requirements of image detection, the gray theory model is used as the basic theory of image processing, and the image detection prediction model is constructed to predict the data. In addition, in order to study the effectiveness of the algorithm, the experiment is carried out to analyze the validity of the data of the study, and the predicted value is compared with the actual value. The research shows that the proposed algorithm has certain accuracy and can provide theoretical reference for subsequent related research.


Asunto(s)
Algoritmos , Neoplasias Colorrectales/diagnóstico por imagen , Minería de Datos/métodos , Interpretación de Imagen Asistida por Computador/métodos , Adenocarcinoma/diagnóstico por imagen , Adenocarcinoma/secundario , Neoplasias Colorrectales/patología , Biología Computacional , Minería de Datos/estadística & datos numéricos , Humanos , Interpretación de Imagen Asistida por Computador/estadística & datos numéricos , Metástasis Linfática/diagnóstico por imagen , Neoplasias del Recto/diagnóstico por imagen , Neoplasias del Recto/patología , Tomografía Computarizada por Rayos X/estadística & datos numéricos
14.
Comput Math Methods Med ; 2021: 6842752, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34646337

RESUMEN

Clustering analysis is one of the most important technologies for single-cell data mining. It is widely used in the division of different gene sequences, the identification of functional genes, and the detection of new cell types. Although the traditional unsupervised clustering method does not require label data, the distribution of the original data, the setting of hyperparameters, and other factors all affect the effectiveness of the clustering algorithm. While in some cases the type of some cells is known, it is hoped to achieve high accuracy if the prior information about those cells is utilized sufficiently. In this study, we propose SCMAG (a semisupervised single-cell clustering method based on a matrix aggregation graph convolutional neural network) that takes into full consideration the prior information for single-cell data. To evaluate the performance of the proposed semisupervised clustering method, we test on different single-cell datasets and compare with the current semisupervised clustering algorithm in recognizing cell types on various real scRNA-seq data; the results show that it is a more accurate and significant model.


Asunto(s)
Análisis por Conglomerados , Redes Neurales de la Computación , Análisis de la Célula Individual/estadística & datos numéricos , Aprendizaje Automático Supervisado , Algoritmos , Biología Computacional , Minería de Datos/estadística & datos numéricos , Bases de Datos de Ácidos Nucleicos , Humanos , RNA-Seq
15.
Med Ref Serv Q ; 40(3): 329-336, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34495798

RESUMEN

The explosive growth of digital information in recent years has amplified the information overload experienced by today's health-care professionals. In particular, the wide variety of unstructured text makes it difficult for researchers to find meaningful data without spending a considerable amount of time reading. Text mining can be used to facilitate better discoverability and analysis, and aid researchers in identifying critical trends and connections. This column will introduce key text-mining terms, recent use cases of biomedical text mining, and current applications for this technology in medical libraries.


Asunto(s)
Investigación Biomédica/tendencias , COVID-19 , Recolección de Datos/tendencias , Minería de Datos/tendencias , Informe de Investigación/tendencias , Investigación Biomédica/estadística & datos numéricos , Recolección de Datos/estadística & datos numéricos , Minería de Datos/estadística & datos numéricos , Predicción , Humanos
16.
PLoS One ; 16(9): e0256603, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34473761

RESUMEN

From administrative registers of last names in Santiago, Chile, we create a surname affinity network that encodes socioeconomic data. This network is a multi-relational graph with nodes representing surnames and edges representing the prevalence of interactions between surnames by socioeconomic decile. We model the prediction of links as a knowledge base completion problem, and find that sharing neighbors is highly predictive of the formation of new links. Importantly, We distinguish between grounded neighbors and neighbors in the embedding space, and find that the latter is more predictive of tie formation. The paper discusses the implications of this finding in explaining the high levels of elite endogamy in Santiago.


Asunto(s)
Minería de Datos/estadística & datos numéricos , Aprendizaje Automático , Nombres , Linaje , Chile , Consanguinidad , Femenino , Humanos , Masculino , Clase Social
17.
PLoS One ; 16(9): e0256940, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34520453

RESUMEN

Fake news is a complex problem that leads to different approaches used to identify them. In our paper, we focus on identifying fake news using its content. The used dataset containing fake and real news was pre-processed using syntactic analysis. Dependency grammar methods were used for the sentences of the dataset and based on them the importance of each word within the sentence was determined. This information about the importance of words in sentences was utilized to create the input vectors for classifications. The paper aims to find out whether it is possible to use the dependency grammar to improve the classification of fake news. We compared these methods with the TfIdf method. The results show that it is possible to use the dependency grammar information with acceptable accuracy for the classification of fake news. An important finding is that the dependency grammar can improve existing techniques. We have improved the traditional TfIdf technique in our experiment.


Asunto(s)
Minería de Datos/estadística & datos numéricos , Decepción , Lingüística/estadística & datos numéricos , Medios de Comunicación Sociales/ética , Conjuntos de Datos como Asunto , Humanos
18.
Cytogenet Genome Res ; 161(6-7): 382-394, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34433169

RESUMEN

Embryonal carcinoma (EC) and seminoma (SE) are both derived from germ cell neoplasia in situ but show big differences in growth patterns and clinical prognosis. Epigenetic regulation may play an important role in the development of EC and SE. This study investigated the DNA methylation-based genetic alterations between EC and SE by analyzing the datasets of mRNA expression and DNA methylation profiling. The datasets were downloaded from the Gene Expression Omnibus database. The differentially expressed genes (DEGs) were identified between EC and SE by limma package in R environment. Gene function enrichment analysis of the DEGs was performed on the DAVID tool, the results of which suggested differences in capability of pluripotency and genomic stability between EC and SE. The minfi package and wANNOVAR tool were used to identify differentially methylated genes. A total of 37 genes were discovered with both mRNA expression and the accordant DNA methylation changes. The findings were verified by the sequencing data from The Cancer Genome Atlas database, and Kaplan-Meier survival analysis was performed. Finally, 5 genes (PRDM1, LMO2, FAM53B, HCN4, and FAM124B) were found that showed both low expression and high methylation in EC, and were significantly associated with relapse-free survival. The findings of methylation-based genetic features between EC and SE might be helpful in studying the role of DNA methylation in cancer development.


Asunto(s)
Biomarcadores de Tumor/genética , Metilación de ADN , Minería de Datos/métodos , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Neoplasias de Células Germinales y Embrionarias/genética , Neoplasias Testiculares/genética , Minería de Datos/estadística & datos numéricos , Epigénesis Genética , Ontología de Genes , Humanos , Estimación de Kaplan-Meier , Masculino , Transducción de Señal/genética
19.
Comput Math Methods Med ; 2021: 4602465, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34335861

RESUMEN

Dementia interferes with the individual's motor, behavioural, and intellectual functions, causing him to be unable to perform instrumental activities of daily living. This study is aimed at identifying the best performing algorithm and the most relevant characteristics to categorise individuals with HIV/AIDS at high risk of dementia from the application of data mining. Principal component analysis (PCA) algorithm was used and tested comparatively between the following machine learning algorithms: logistic regression, decision tree, neural network, KNN, and random forest. The database used for this study was built from the data collection of 270 individuals infected with HIV/AIDS and followed up at the outpatient clinic of a reference hospital for infectious and parasitic diseases in the State of Ceará, Brazil, from January to April 2019. Also, the performance of the algorithms was analysed for the 104 characteristics available in the database; then, with the reduction of dimensionality, there was an improvement in the quality of the machine learning algorithms and identified that during the tests, even losing about 30% of the variation. Besides, when considering only 23 characteristics, the precision of the algorithms was 86% in random forest, 56% logistic regression, 68% decision tree, 60% KNN, and 59% neural network. The random forest algorithm proved to be more effective than the others, obtaining 84% precision and 86% accuracy.


Asunto(s)
Complejo SIDA Demencia/diagnóstico , Síndrome de Inmunodeficiencia Adquirida/complicaciones , Algoritmos , Demencia/etiología , Complejo SIDA Demencia/epidemiología , Complejo SIDA Demencia/etiología , Anciano , Brasil/epidemiología , Biología Computacional , Minería de Datos/métodos , Minería de Datos/estadística & datos numéricos , Bases de Datos Factuales , Árboles de Decisión , Femenino , Estudios de Seguimiento , Humanos , Modelos Logísticos , Aprendizaje Automático , Masculino , Persona de Mediana Edad , Redes Neurales de la Computación , Factores de Riesgo
20.
PLoS Comput Biol ; 17(8): e1008844, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34370723

RESUMEN

Many biological processes are mediated by protein-protein interactions (PPIs). Because protein domains are the building blocks of proteins, PPIs likely rely on domain-domain interactions (DDIs). Several attempts exist to infer DDIs from PPI networks but the produced datasets are heterogeneous and sometimes not accessible, while the PPI interactome data keeps growing. We describe a new computational approach called "PPIDM" (Protein-Protein Interactions Domain Miner) for inferring DDIs using multiple sources of PPIs. The approach is an extension of our previously described "CODAC" (Computational Discovery of Direct Associations using Common neighbors) method for inferring new edges in a tripartite graph. The PPIDM method has been applied to seven widely used PPI resources, using as "Gold-Standard" a set of DDIs extracted from 3D structural databases. Overall, PPIDM has produced a dataset of 84,552 non-redundant DDIs. Statistical significance (p-value) is calculated for each source of PPI and used to classify the PPIDM DDIs in Gold (9,175 DDIs), Silver (24,934 DDIs) and Bronze (50,443 DDIs) categories. Dataset comparison reveals that PPIDM has inferred from the 2017 releases of PPI sources about 46% of the DDIs present in the 2020 release of the 3did database, not counting the DDIs present in the Gold-Standard. The PPIDM dataset contains 10,229 DDIs that are consistent with more than 13,300 PPIs extracted from the IMEx database, and nearly 23,300 DDIs (27.5%) that are consistent with more than 214,000 human PPIs extracted from the STRING database. Examples of newly inferred DDIs covering more than 10 PPIs in the IMEx database are provided. Further exploitation of the PPIDM DDI reservoir includes the inventory of possible partners of a protein of interest and characterization of protein interactions at the domain level in combination with other methods. The result is publicly available at http://ppidm.loria.fr/.


Asunto(s)
Dominios y Motivos de Interacción de Proteínas , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Mapas de Interacción de Proteínas , Algoritmos , Biología Computacional , Minería de Datos/estadística & datos numéricos , Bases de Datos de Proteínas/estadística & datos numéricos , Humanos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA