|

1.

Genome-wide association study analysis of disease severity in Acne reveals novel biological insights.

Du, Zhaohui; Iyyanki, Tejaswi; Lessard, Samuel; Chao, Michael; Asbrand, Christian; Nassar, Dany; Klinger, Katherine; de Rinaldis, Emanuele; Khader, Shameer; Chatelain, Clément.

medRxiv ; 2023 Nov 14.

Article En | MEDLINE | ID: mdl-38014089

Acne vulgaris is a common skin disease that affects >85% of teenage young adults among which >8% develop severe lesions that leaves permanent scars. Genetic heritability studies of acne in twin cohorts have estimated that the heritability for acne is 80%. Previous genome-wide association studies (GWAS) have identified 50 genetic loci associated with increased risk of developing acne when compared to healthy individuals. However only a few studies have investigated genetic association with disease severity. GWAS of disease progression may provide a more effective approach to unveil potential disease modifying therapeutic targets. Here, we performed a multi-ethnic GWAS analysis to capture disease severity in acne patients by using individuals with normal acne as a control. Our cohort consists of a total of 2,956 participants, including 290 severe acne cases and 930 normal acne controls from FinnGen, and 522 cases and 1,214 controls from BioVU. We also performed mendelian randomization (MR), colocalization analyses and transcriptome-wide association study (TWAS) to identify putative causal genes. Lastly, we performed gene-set enrichment analysis using MAGMA to implicate biological pathways that drive disease severity in Acne. We identified two new loci associated with acne severity at the genome-wide significance level, six novel associated genes by MR, colocalization and TWAS analyses, including genes CDC7, SLC7A1, ADAM23, TTLL10, CDK20 and DNAJA4 , and 5 novel pathways by MAGMA analyses. Our study suggests that the etiologies of acne susceptibility and severity have limited overlap, with only 26% of known acne risk loci presenting nominal association with acne severity and none of the novel severity associated genes reported as associated with acne risk in previous GWAS.

2.

DAN: A Segmentation-Free Document Attention Network for Handwritten Document Recognition.

Coquenet, Denis; Chatelain, Clement; Paquet, Thierry.

IEEE Trans Pattern Anal Mach Intell ; 45(7): 8227-8243, 2023 Jul.

Article En | MEDLINE | ID: mdl-37018638

Unconstrained handwritten text recognition is a challenging computer vision task. It is traditionally handled by a two-step approach, combining line segmentation followed by text line recognition. For the first time, we propose an end-to-end segmentation-free architecture for the task of handwritten document recognition: the Document Attention Network. In addition to text recognition, the model is trained to label text parts using begin and end tags in an XML-like fashion. This model is made up of an FCN encoder for feature extraction and a stack of transformer decoder layers for a recurrent token-by-token prediction process. It takes whole text documents as input and sequentially outputs characters, as well as logical layout tokens. Contrary to the existing segmentation-based approaches, the model is trained without using any segmentation label. We achieve competitive results on the READ 2016 dataset at page level, as well as double-page level with a CER of 3.43% and 3.70%, respectively. We also provide results for the RIMES 2009 dataset at page level, reaching 4.54% of CER. We provide all source code and pre-trained model weights at https://github.com/FactoDeepLearning/DAN.

3.

End-to-End Handwritten Paragraph Text Recognition Using a Vertical Attention Network.

Coquenet, Denis; Chatelain, Clement; Paquet, Thierry.

IEEE Trans Pattern Anal Mach Intell ; 45(1): 508-524, 2023 Jan.

Article En | MEDLINE | ID: mdl-35077353

Unconstrained handwritten text recognition remains challenging for computer vision systems. Paragraph text recognition is traditionally achieved by two models: the first one for line segmentation and the second one for text line recognition. We propose a unified end-to-end model using hybrid attention to tackle this task. This model is designed to iteratively process a paragraph image line by line. It can be split into three modules. An encoder generates feature maps from the whole paragraph image. Then, an attention module recurrently generates a vertical weighted mask enabling to focus on the current text line features. This way, it performs a kind of implicit line segmentation. For each text line features, a decoder module recognizes the character sequence associated, leading to the recognition of a whole paragraph. We achieve state-of-the-art character error rate at paragraph level on three popular datasets: 1.91% for RIMES, 4.45% for IAM and 3.59% for READ 2016. Our code and trained model weights are available at https://github.com/FactoDeepLearning/VerticalAttentionOCR.

4.

A systematic analysis of gene-gene interaction in multiple sclerosis.

Slim, Lotfi; Chatelain, Clément; Foucauld, Hélène de; Azencott, Chloé-Agathe.

BMC Med Genomics ; 15(1): 100, 2022 04 30.

Article En | MEDLINE | ID: mdl-35501860

BACKGROUND: For the most part, genome-wide association studies (GWAS) have only partially explained the heritability of complex diseases. One of their limitations is to assume independent contributions of individual variants to the phenotype. Many tools have therefore been developed to investigate the interactions between distant loci, or epistasis. Among them, the recently proposed EpiGWAS models the interactions between a target variant and the rest of the genome. However, applying this approach to studying interactions along all genes of a disease map is not straightforward. Here, we propose a pipeline to that effect, which we illustrate by investigating a multiple sclerosis GWAS dataset from the Wellcome Trust Case Control Consortium 2 through 19 disease maps from the MetaCore pathway database. RESULTS: For each disease map, we build an epistatic network by connecting the genes that are deemed to interact. These networks tend to be connected, complementary to the disease maps and contain hubs. In addition, we report 4 epistatic gene pairs involving missense variants, and 25 gene pairs with a deleterious epistatic effect mediated by eQTLs. Among these, we highlight the interaction of GLI-1 and SUFU, and of IP10 and NF-[Formula: see text]B, as they both match known biological interactions. The latter pair is particularly promising for therapeutic development, as both genes have known inhibitors. CONCLUSIONS: Our study showcases the ability of EpiGWAS to uncover biologically interpretable epistatic interactions that are potentially actionable for the development of combination therapy.

Epistasis, Genetic , Multiple Sclerosis , Case-Control Studies , Genome-Wide Association Study , Humans , Multiple Sclerosis/genetics , Phenotype

5.

Author Correction: Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network.

Grapotte, Mathys; Saraswat, Manu; Bessière, Chloé; Menichelli, Christophe; Ramilowski, Jordan A; Severin, Jessica; Hayashizaki, Yoshihide; Itoh, Masayoshi; Tagami, Michihira; Murata, Mitsuyoshi; Kojima-Ishiyama, Miki; Noma, Shohei; Noguchi, Shuhei; Kasukawa, Takeya; Hasegawa, Akira; Suzuki, Harukazu; Nishiyori-Sueki, Hiromi; Frith, Martin C; Chatelain, Clément; Carninci, Piero; de Hoon, Michiel J L; Wasserman, Wyeth W; Bréhélin, Laurent; Lecellier, Charles-Henri.

Nat Commun ; 13(1): 1200, 2022 Mar 01.

Article En | MEDLINE | ID: mdl-35232988

6.

Nonlinear post-selection inference for genome-wide association studies.

Slim, Lotfi; Chatelain, Clément; Azencott, Chloé-Agathe.

Pac Symp Biocomput ; 27: 349-360, 2022.

Article En | MEDLINE | ID: mdl-34890162

To address the lack of statistical power and interpretability of genome-wide association studies (GWAS), gene-level analyses combine the p-values of individual single nucleotide polymorphisms (SNPs) into gene statistics. However, using all SNPs mapped to a gene, including those with low association scores, can mask the association signal of a gene.We therefore propose a new two-step strategy, consisting in first selecting the SNPs most associated with the phenotype within a given gene, before testing their joint effect on the phenotype. The recently proposed kernelPSI framework for kernel-based post-selection inference makes it possible to model non-linear relationships between features, as well as to obtain valid p-values that account for the selection step.In this paper, we show how we adapted kernelPSI to the setting of quantitative GWAS, using kernels to model epistatic interactions between neighboring SNPs, and post-selection inference to determine the joint effect of selected blocks of SNPs on a phenotype. We illustrate this tool on the study of two continuous phenotypes from the UKBiobank.We show that kernelPSI can be successfully used to study GWAS data and detect genes associated with a phenotype through the signal carried by the most strongly associated regions of these genes. In particular, we show that kernelPSI enjoys more statistical power than other gene-based GWAS tools, such as SKAT or MAGMA.kernelPSI is an effective tool to combine SNP-based and gene-based analyses of GWAS data, and can be used successfully to improve both statistical performance and interpretability of GWAS.

Computational Biology , Genome-Wide Association Study , Humans , Phenotype , Polymorphism, Single Nucleotide

7.

Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network.

Grapotte, Mathys; Saraswat, Manu; Bessière, Chloé; Menichelli, Christophe; Ramilowski, Jordan A; Severin, Jessica; Hayashizaki, Yoshihide; Itoh, Masayoshi; Tagami, Michihira; Murata, Mitsuyoshi; Kojima-Ishiyama, Miki; Noma, Shohei; Noguchi, Shuhei; Kasukawa, Takeya; Hasegawa, Akira; Suzuki, Harukazu; Nishiyori-Sueki, Hiromi; Frith, Martin C; Chatelain, Clément; Carninci, Piero; de Hoon, Michiel J L; Wasserman, Wyeth W; Bréhélin, Laurent; Lecellier, Charles-Henri.

Nat Commun ; 12(1): 3297, 2021 06 02.

Article En | MEDLINE | ID: mdl-34078885

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.

Microsatellite Repeats , Neural Networks, Computer , Neurodegenerative Diseases/genetics , Transcription Initiation Site , Transcription Initiation, Genetic , A549 Cells , Animals , Base Sequence , Computational Biology/methods , Deep Learning , Enhancer Elements, Genetic , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Mice , Neurodegenerative Diseases/diagnosis , Neurodegenerative Diseases/metabolism , Polymorphism, Genetic , Promoter Regions, Genetic

8.

Novel methods for epistasis detection in genome-wide association studies.

Slim, Lotfi; Chatelain, Clément; Azencott, Chloé-Agathe; Vert, Jean-Philippe.

PLoS One ; 15(11): e0242927, 2020.

Article En | MEDLINE | ID: mdl-33253293

More and more genome-wide association studies are being designed to uncover the full genetic basis of common diseases. Nonetheless, the resulting loci are often insufficient to fully recover the observed heritability. Epistasis, or gene-gene interaction, is one of many hypotheses put forward to explain this missing heritability. In the present work, we propose epiGWAS, a new approach for epistasis detection that identifies interactions between a target SNP and the rest of the genome. This contrasts with the classical strategy of epistasis detection through exhaustive pairwise SNP testing. We draw inspiration from causal inference in randomized clinical trials, which allows us to take into account linkage disequilibrium. EpiGWAS encompasses several methods, which we compare to state-of-the-art techniques for epistasis detection on simulated and real data. The promising results demonstrate empirically the benefits of EpiGWAS to identify pairwise interactions.

Epistasis, Genetic/genetics , Genome-Wide Association Study/statistics & numerical data , Linkage Disequilibrium/genetics , Models, Genetic , Algorithms , Humans , Polymorphism, Single Nucleotide/genetics

9.

Performance of epistasis detection methods in semi-simulated GWAS.

Chatelain, Clément; Durand, Guillermo; Thuillier, Vincent; Augé, Franck.

BMC Bioinformatics ; 19(1): 231, 2018 06 18.

Article En | MEDLINE | ID: mdl-29914375

BACKGROUND: Part of the missing heritability in Genome Wide Association Studies (GWAS) is expected to be explained by interactions between genetic variants, also called epistasis. Various statistical methods have been developed to detect epistasis in case-control GWAS. These methods face major statistical challenges due to the number of tests required, the complexity of the Linkage Disequilibrium (LD) structure, and the lack of consensus regarding the definition of epistasis. Their limited impact in terms of uncovering new biological knowledge might be explained in part by the limited amount of experimental data available to validate their statistical performances in a realistic GWAS context. In this paper, we introduce a simulation pipeline for generating real scale GWAS data, including epistasis and realistic LD structure. We evaluate five exhaustive bivariate interaction methods, fastepi, GBOOST, SHEsisEpi, DSS, and IndOR. Two hundred thirty four different disease scenarios are considered in extensive simulations. We report the performances of each method in terms of false positive rate control, power, area under the ROC curve (AUC), and computation time using a GPU. Finally we compare the result of each methods on a real GWAS of type 2 diabetes from the Welcome Trust Case Control Consortium. RESULTS: GBOOST, SHEsisEpi and DSS allow a satisfactory control of the false positive rate. fastepi and IndOR present an increase in false positive rate in presence of LD between causal SNPs, with our definition of epistasis. DSS performs best in terms of power and AUC in most scenarios with no or weak LD between causal SNPs. All methods can exhaustively analyze a GWAS with 6.105 SNPs and 15,000 samples in a couple of hours using a GPU. CONCLUSION: This study confirms that computation time is no longer a limiting factor for performing an exhaustive search of epistasis in large GWAS. For this task, using DSS on SNP pairs with limited LD seems to be a good strategy to achieve the best statistical performance. A combination approach using both DSS and GBOOST is supported by the simulation results and the analysis of the WTCCC dataset demonstrated that this approach can detect distinct genes in epistasis. Finally, weak epistasis between common variants will be detectable with existing methods when GWAS of a few tens of thousands cases and controls are available.

Computational Biology/methods , Computer Simulation , Diabetes Mellitus, Type 2/genetics , Epistasis, Genetic , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Algorithms , Data Interpretation, Statistical , Humans

10.

Spotting L3 slice in CT scans using deep convolutional network and transfer learning.

Belharbi, Soufiane; Chatelain, Clément; Hérault, Romain; Adam, Sébastien; Thureau, Sébastien; Chastan, Mathieu; Modzelewski, Romain.

Comput Biol Med ; 87: 95-103, 2017 08 01.

Article En | MEDLINE | ID: mdl-28558319

In this article, we present a complete automated system for spotting a particular slice in a complete 3D Computed Tomography exam (CT scan). Our approach does not require any assumptions on which part of the patient's body is covered by the scan. It relies on an original machine learning regression approach. Our models are learned using the transfer learning trick by exploiting deep architectures that have been pre-trained on imageNet database, and therefore it requires very little annotation for its training. The whole pipeline consists of three steps: i) conversion of the CT scans into Maximum Intensity Projection (MIP) images, ii) prediction from a Convolutional Neural Network (CNN) applied in a sliding window fashion over the MIP image, and iii) robust analysis of the prediction sequence to predict the height of the desired slice within the whole CT scan. Our approach is applied to the detection of the third lumbar vertebra (L3) slice that has been found to be representative to the whole body composition. Our system is evaluated on a database collected in our clinical center, containing 642 CT scans from different patients. We obtained an average localization error of 1.91±2.69 slices (less than 5 mm) in an average time of less than 2.5 s/CT scan, allowing integration of the proposed system into daily clinical routines.

Lumbar Vertebrae/diagnostic imaging , Machine Learning , Tomography, X-Ray Computed/methods , Humans , Neural Networks, Computer , Radiology Information Systems

11.

Patterns in melanocytic lesions: impact of the geometry on growth and transport inside the epidermis.

Balois, Thibaut; Chatelain, Clément; Ben Amar, Martine.

J R Soc Interface ; 11(97): 20140339, 2014 Aug 06.

Article En | MEDLINE | ID: mdl-24872499

In glabrous skin, nevi and melanomas exhibit pigmented stripes during clinical dermoscopic examination. They find their origin in the basal layer geometry which periodically exhibits ridges, alternatively large (limiting ridges) and thin (intermediate ridges). However, nevus and melanoma lesions differ by the localization of the pigmented stripes along furrows or ridges of the epidermis surface. Here, we propose a biomechanical model of avascular tumour growth which takes into account this specific geometry in the epidermis where both kinds of lesions first appear. Simulations show a periodic distribution of tumour cells inside the lesion, with a global contour stretched out along the ridges. In order to be as close as possible to clinical observations, we also consider the melanin transport by the keratinocytes. Our simulations show that reasonable assumptions on melanocytic cell repartition in the ridges favour the limiting ridges of the basal compared with the intermediate ones in agreement with nevus observations but not really with melanomas. It raises the question of cell aggregation and repartition of melanocytic cells in acral melanomas and requires further biological studies of these cells in situ.

Melanocytes/metabolism , Melanocytes/pathology , Melanoma/metabolism , Melanoma/pathology , Models, Biological , Skin Neoplasms/metabolism , Skin Neoplasms/pathology , Animals , Cell Movement , Cell Proliferation , Cell Size , Computer Simulation , Epidermis/metabolism , Epidermis/pathology , Humans , Melanins/metabolism , Neoplasm Invasiveness

12.

Morphological changes in early melanoma development: influence of nutrients, growth inhibitors and cell-adhesion mechanisms.

Chatelain, Clément; Ciarletta, Pasquale; Ben Amar, Martine.

J Theor Biol ; 290: 46-59, 2011 Dec 07.

Article En | MEDLINE | ID: mdl-21903099

Current diagnostic methods for skin cancers are based on some morphological characteristics of the pigmented skin lesions, including the geometry of their contour. The aim of this article is to model the early growth of melanoma accounting for the biomechanical characteristics of the tumor micro-environment, and evaluating their influence on the tumor morphology and its evolution. The spatial distribution of tumor cells and diffusing molecules are explicitly described in a three-dimensional multiphase model, which incorporates general cell-to-cell mechanical interactions, a dependence of cell proliferation on contact inhibition, as well as a local diffusion of nutrients and inhibiting molecules. A two-dimensional model is derived in a lubrication limit accounting for the thin geometry of the epidermis. First, the dynamical and spatial properties of planar and circular tumor fronts are studied, with both numerical and analytical techniques. A WKB method is then developed in order to analyze the solution of the governing partial differential equations and to derive the threshold conditions for a contour instability of the growing tumor. A control parameter and a critical wavelength are identified, showing that high cell proliferation, high cell adhesion, large tumor radius and slow tumor growth correlate with the occurrence of a contour instability. Finally, comparing the theoretical results with a large amount of clinical data we show that our predictions describe accurately both the morphology of melanoma observed in vivo and its variations with the tumor growth rate. This study represents a fundamental step to understand more complex microstructural patterns observed during skin tumor growth. Its results have important implications for the improvement of the diagnostic methods for melanoma, possibly driving progress towards a personalized screening.

Melanoma/pathology , Models, Biological , Skin Neoplasms/pathology , Cell Adhesion/physiology , Cell Proliferation , Disease Progression , Epidermis/pathology , Growth Inhibitors/physiology , Humans , Intercellular Signaling Peptides and Proteins/physiology , Neoplasm Invasiveness , Tumor Microenvironment/physiology

13.

Probability distributions for polymer translocation.

Chatelain, Clément; Kantor, Yacov; Kardar, Mehran.

Phys Rev E Stat Nonlin Soft Matter Phys ; 78(2 Pt 1): 021129, 2008 Aug.

Article En | MEDLINE | ID: mdl-18850808

We study the passage (translocation) of a self-avoiding polymer through a membrane pore in two dimensions. In particular, we numerically measure the probability distribution Q(T) of the translocation time T, and the distribution P(s,t) of the translocation coordinate s at various times t. When scaled with the mean translocation time T , Q(T) becomes independent of polymer length, and decays exponentially for large T. The probability P(s,t) is well described by a Gaussian at short times, with a variance of s that grows subdiffusively as talpha with alpha approximately 0.8. For times exceeding T , P(s,t) of the polymers that have not yet finished their translocation has a nontrivial stable shape.