Pesquisa | Portal de Pesquisa da BVS Enfermagem

GAN-based data augmentation for transcriptomics: survey and comparative assessment.

Lacan, Alice; Sebag, Michèle; Hanczar, Blaise.

Bioinformatics ; 39(39 Suppl 1): i111-i120, 2023 06 30.

Artigo em Inglês | MEDLINE | ID: mdl-37387181

RESUMO

MOTIVATION: Transcriptomics data are becoming more accessible due to high-throughput and less costly sequencing methods. However, data scarcity prevents exploiting deep learning models' full predictive power for phenotypes prediction. Artificially enhancing the training sets, namely data augmentation, is suggested as a regularization strategy. Data augmentation corresponds to label-invariant transformations of the training set (e.g. geometric transformations on images and syntax parsing on text data). Such transformations are, unfortunately, unknown in the transcriptomic field. Therefore, deep generative models such as generative adversarial networks (GANs) have been proposed to generate additional samples. In this article, we analyze GAN-based data augmentation strategies with respect to performance indicators and the classification of cancer phenotypes. RESULTS: This work highlights a significant boost in binary and multiclass classification performances due to augmentation strategies. Without augmentation, training a classifier on only 50 RNA-seq samples yields an accuracy of, respectively, 94% and 70% for binary and tissue classification. In comparison, we achieved 98% and 94% of accuracy when adding 1000 augmented samples. Richer architectures and more expensive training of the GAN return better augmentation performances and generated data quality overall. Further analysis of the generated data shows that several performance indicators are needed to assess its quality correctly. AVAILABILITY AND IMPLEMENTATION: All data used for this research are publicly available and comes from The Cancer Genome Atlas. Reproducible code is available on the GitLab repository: https://forge.ibisc.univ-evry.fr/alacan/GANs-for-transcriptomics.

Assuntos

Perfilação da Expressão Gênica , Transcriptoma , RNA-Seq , Confiabilidade dos Dados , Fenótipo

Cartolabe: A Web-Based Scalable Visualization of Large Document Collections.

Caillou, Philippe; Renault, Jonas; Fekete, Jean-Daniel; Letournel, Anne-Catherine; Sebag, Michele.

IEEE Comput Graph Appl ; 41(2): 76-88, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33095705

RESUMO

We describe Cartolabe, a web-based multiscale system for visualizing and exploring large textual corpora based on topics, introducing a novel mechanism for the progressive visualization of filtering queries. Initially designed to represent and navigate through scientific publications in different disciplines, Cartolabe has evolved to become a generic framework and accommodate various corpora, ranging from Wikipedia (4.5M entries) to the French National Debate (4.3M entries). Cartolabe is made of two modules: The first relies on natural language processing methods, converting a corpus and its entities (documents, authors, and concepts) into high-dimensional vectors, computing their projection on the two-dimensional plane, and extracting meaningful labels for regions of the plane. The second module is a web-based visualization, displaying tiles computed from the multidimensional projection of the corpus using the Umap projection method. This visualization module aims at enabling users with no expertise in visualization and data analysis to get an overview of their corpus, and to interact with it: exploring, querying, filtering, panning, and zooming on regions of semantic interest. Three use cases are discussed to illustrate Cartolabe's versatility and ability to bring large-scale textual corpus visualization and exploration to a wide audience.

Interdisciplinary Research in Artificial Intelligence: Challenges and Opportunities.

Kusters, Remy; Misevic, Dusan; Berry, Hugues; Cully, Antoine; Le Cunff, Yann; Dandoy, Loic; Díaz-Rodríguez, Natalia; Ficher, Marion; Grizou, Jonathan; Othmani, Alice; Palpanas, Themis; Komorowski, Matthieu; Loiseau, Patrick; Moulin Frier, Clément; Nanini, Santino; Quercia, Daniele; Sebag, Michele; Soulié Fogelman, Françoise; Taleb, Sofiane; Tupikina, Liubov; Sahu, Vaibhav; Vie, Jill-Jênn; Wehbi, Fatima.

Front Big Data ; 3: 577974, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33693418

RESUMO

The use of artificial intelligence (AI) in a variety of research fields is speeding up multiple digital revolutions, from shifting paradigms in healthcare, precision medicine and wearable sensing, to public services and education offered to the masses around the world, to future cities made optimally efficient by autonomous driving. When a revolution happens, the consequences are not obvious straight away, and to date, there is no uniformly adapted framework to guide AI research to ensure a sustainable societal transition. To answer this need, here we analyze three key challenges to interdisciplinary AI research, and deliver three broad conclusions: 1) future development of AI should not only impact other scientific domains but should also take inspiration and benefit from other fields of science, 2) AI research must be accompanied by decision explainability, dataset bias transparency as well as development of evaluation methodologies and creation of regulatory agencies to ensure responsibility, and 3) AI education should receive more attention, efforts and innovation from the educational and scientific communities. Our analysis is of interest not only to AI practitioners but also to other researchers and the general public as it offers ways to guide the emerging collaborations and interactions toward the most fruitful outcomes.

Scaling analysis of affinity propagation.

Furtlehner, Cyril; Sebag, Michèle; Zhang, Xiangliang.

Phys Rev E Stat Nonlin Soft Matter Phys ; 81(6 Pt 2): 066102, 2010 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-20866473

RESUMO

We analyze and exploit some scaling properties of the affinity propagation (AP) clustering algorithm proposed by Frey and Dueck [Science 315, 972 (2007)]. Following a divide and conquer strategy we setup an exact renormalization-based approach to address the question of clustering consistency, in particular, how many cluster are present in a given data set. We first observe that the divide and conquer strategy, used on a large data set hierarchically reduces the complexity O(N2) to O(N((h+2)/(h+1))) , for a data set of size N and a depth h of the hierarchical strategy. For a data set embedded in a d -dimensional space, we show that this is obtained without notably damaging the precision except in dimension d=2 . In fact, for d larger than 2 the relative loss in precision scales such as N((2-d)/(h+1)d). Finally, under some conditions we observe that there is a value s* of the penalty coefficient, a free parameter used to fix the number of clusters, which separates a fragmentation phase (for ss*) of the underlying hidden cluster structure. At this precise point holds a self-similarity property which can be exploited by the hierarchical strategy to actually locate its position, as a result of an exact decimation procedure. From this observation, a strategy based on AP can be defined to find out how many clusters are present in a given data set.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA