Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Brief Bioinform ; 22(2): 1592-1603, 2021 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-33569575

RESUMEN

Biomedical scientific literature is growing at a very rapid pace, which makes increasingly difficult for human experts to spot the most relevant results hidden in the papers. Automatized information extraction tools based on text mining techniques are therefore needed to assist them in this task. In the last few years, deep neural networks-based techniques have significantly contributed to advance the state-of-the-art in this research area. Although the contribution to this progress made by supervised methods is relatively well-known, this is less so for other kinds of learning, namely unsupervised and self-supervised learning. Unsupervised learning is a kind of learning that does not require the cost of creating labels, which is very useful in the exploratory stages of a biomedical study where agile techniques are needed to rapidly explore many paths. In particular, clustering techniques applied to biomedical text mining allow to gather large sets of documents into more manageable groups. Deep learning techniques have allowed to produce new clustering-friendly representations of the data. On the other hand, self-supervised learning is a kind of supervised learning where the labels do not have to be manually created by humans, but are automatically derived from relations found in the input texts. In combination with innovative network architectures (e.g. transformer-based architectures), self-supervised techniques have allowed to design increasingly effective vector-based word representations (word embeddings). We show in this survey how word representations obtained in this way have proven to successfully interact with common supervised modules (e.g. classification networks) to whose performance they greatly contribute.


Asunto(s)
Minería de Datos/métodos , Aprendizaje Profundo , Aprendizaje Automático Supervisado , Aprendizaje Automático no Supervisado , Algoritmos , Análisis por Conglomerados , Redes Neurales de la Computación
2.
Occup Environ Med ; 79(3): 155-161, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-34413158

RESUMEN

AIM: The biological mechanisms of work-related asthma induced by irritants remain unclear. We investigated the associations between occupational exposure to irritants and respiratory endotypes previously identified among never asthmatics (NA) and current asthmatics (CA) integrating clinical characteristics and biomarkers related to oxidative stress and inflammation. METHODS: We used cross-sectional data from 999 adults (mean 45 years old, 46% men) from the case-control and familial Epidemiological study on the Genetics and Environments of Asthma (EGEA) study. Five respiratory endotypes have been identified using a cluster-based approach: NA1 (n=463) asymptomatic, NA2 (n=169) with respiratory symptoms, CA1 (n=50) with active treated adult-onset asthma, poor lung function, high blood neutrophil counts and high fluorescent oxidation products level, CA2 (n=203) with mild middle-age asthma, rhinitis and low immunoglobulin E level, and CA3 (n=114) with inactive/mild untreated allergic childhood-onset asthma. Occupational exposure to irritants during the current or last held job was assessed by the updated occupational asthma-specific job-exposure matrix (levels of exposure: no/medium/high). Associations between irritants and each respiratory endotype (NA1 asymptomatic as reference) were studied using logistic regressions adjusted for age, sex and smoking status. RESULTS: Prevalence of high occupational exposure to irritants was 7% in NA1, 6% in NA2, 16% in CA1, 7% in CA2 and 10% in CA3. High exposure to irritants was associated with CA1 (adjusted OR aOR, (95% CI) 2.7 (1.0 to 7.3)). Exposure to irritants was not significantly associated with other endotypes (aOR range: 0.8 to 1.5). CONCLUSION: Occupational exposure to irritants was associated with a distinct respiratory endotype suggesting oxidative stress and neutrophilic inflammation as potential associated biological mechanisms.


Asunto(s)
Asma Ocupacional , Enfermedades Profesionales , Exposición Profesional , Adulto , Asma Ocupacional/inducido químicamente , Asma Ocupacional/epidemiología , Niño , Estudios Transversales , Femenino , Humanos , Inflamación , Irritantes/efectos adversos , Masculino , Persona de Mediana Edad , Enfermedades Profesionales/epidemiología , Exposición Profesional/efectos adversos
4.
J Biomed Inform ; 60: 252-9, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-26911523

RESUMEN

Text mining can assist in the analysis and interpretation of large-scale biomedical data, helping biologists to quickly and cheaply gain confirmation of hypothesized relationships between biological entities. We set this question in the context of genome-wide association studies (GWAS), an actively emerging field that contributed to identify many genes associated with multifactorial diseases. These studies allow to identify groups of genes associated with the same phenotype, but provide no information about the relationships between these genes. Therefore, our objective is to leverage unsupervised text mining techniques using text-based cosine similarity comparisons and clustering applied to candidate and random gene vectors, in order to augment the GWAS results. We propose a generic framework which we used to characterize the relationships between 10 genes reported associated with asthma by a previous GWAS. The results of this experiment showed that the similarities between these 10 genes were significantly stronger than would be expected by chance (one-sided p-value<0.01). The clustering of observed and randomly selected gene also allowed to generate hypotheses about potential functional relationships between these genes and thus contributed to the discovery of new candidate genes for asthma.


Asunto(s)
Biología Computacional/métodos , Minería de Datos/métodos , Estudio de Asociación del Genoma Completo , Algoritmos , Asma/genética , Análisis por Conglomerados , Predisposición Genética a la Enfermedad , Genoma Humano , Genómica , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple
5.
Artículo en Inglés | MEDLINE | ID: mdl-35749324

RESUMEN

Representation learning is a central problem of attributed networks (ANs) data analysis in a variety of fields. Given an attributed graph, the objectives are to obtain a representation of nodes and a partition of the set of nodes. Usually, these two objectives are pursued separately via two tasks that are performed sequentially, and any benefit that may be obtained by performing them simultaneously is lost. In this brief, we propose a power-attributed graph embedding and clustering (PAGEC for short) in which the two tasks, embedding and clustering, are considered together. To jointly encode data affinity between node links and attributes, we use a new powered proximity matrix. We formulate a new matrix decomposition model to obtain node representation and node clustering simultaneously. Theoretical analysis shows the close connections between the new proximity matrix and the random walk theory on a graph. Experimental results demonstrate that the PAGEC algorithm performs better, in terms of clustering and embedding, than state-of-the-art algorithms including deep learning methods designed for similar tasks in relation to attributed network datasets with different characteristics.

6.
BMJ Open Respir Res ; 7(1)2020 12.
Artículo en Inglés | MEDLINE | ID: mdl-33268339

RESUMEN

BACKGROUND: Identifying relevant asthma endotypes may be the first step towards improving asthma management. We aimed identifying respiratory endotypes in adults using a cluster analysis and to compare their clinical characteristics at follow-up. METHODS: The analysis was performed separately among current asthmatics (CA, n=402) and never asthmatics (NA, n=666) from the first follow-up of the French EGEA study (EGEA2). Cluster analysis jointly considered 4 demographic, 22 clinical/functional (respiratory symptoms, asthma treatments, lung function) and four blood biological (allergy-related, inflammation-related and oxidative stress-related biomarkers) characteristics at EGEA2. The clinical characteristics at follow-up (EGEA3) were compared according to the endotype identified at EGEA2. RESULTS: We identified five respiratory endotypes, three among CA and two among NA: CA1 (n=53) with active treated adult-onset asthma, poor lung function, chronic cough and phlegm and dyspnoea, high body mass index, and high blood neutrophil count and fluorescent oxidation products level; CA2 (n=219) with mild asthma and rhinitis; CA3 (n=130) with inactive/mild untreated allergic childhood-onset asthma, high frequency of current smokers and low frequency of attacks of breathlessness at rest, and high IgE level; NA1 (n=489) asymptomatic, and NA2 (n=177) with respiratory symptoms, high blood neutrophil and eosinophil counts. CA1 had poor asthma control and high leptin level, CA2 had hyper-responsiveness and high interleukin (IL)-1Ra, IL-5, IL-7, IL-8, IL-10, IL-13 and TNF-α levels, and NA2 had high leptin and C reactive protein levels. Ten years later, asthmatics in CA1 had worse clinical characteristics whereas those in CA3 had better respiratory outcomes than CA2; NA in NA2 had more respiratory symptoms and higher rate of incident asthma than those in NA1. CONCLUSION: These results highlight the interest to jointly consider clinical and biological characteristics in cluster analyses to identify endotypes among adults with or without asthma.


Asunto(s)
Asma , Rinitis , Adulto , Asma/diagnóstico , Asma/epidemiología , Estudios de Casos y Controles , Niño , Análisis por Conglomerados , Humanos , Recuento de Leucocitos
7.
IEEE Trans Neural Netw Learn Syst ; 29(12): 6396-6401, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-29993844

RESUMEN

Spectral clustering is often carried out by combining spectral data embedding and -means clustering. However, the aims, dimensionality reduction and clustering, are usually not performed jointly. In this brief, we propose a novel approach to finding an optimal spectral embedding for identifying a partition of the set of objects; it iteratively alternates spectral embedding and clustering. In doing so, we show that our model can learn a low-dimensional representation more suited to clustering. Compared with classical spectral clustering methods, the proposed algorithm is not costly and outperforms not only these methods but also other nonnegative matrix factorization variants.

8.
IEEE Trans Neural Netw Learn Syst ; 26(9): 2194-9, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25376044

RESUMEN

We propose a new theoretical framework for data visualization. This framework is based on iterative procedure looking up an appropriate approximation of the data matrix A by using two stochastic similarity matrices from the set of rows and the set of columns. This process converges to a steady state where the approximated data  is composed of g similar rows and l similar columns. Reordering A according to the first left and right singular vectors involves an optimal data reorganization revealing homogeneous block clusters. Furthermore, we show that our approach is related to a Markov chain model, to the double k-means with g ×l block clusters and to a spectral coclustering. Numerical experiments on simulated and real data sets show the interest of our approach.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA