Pesquisa | Biblioteca Virtual em Saúde

IMPatienT: An Integrated Web Application to Digitize, Process and Explore Multimodal PATIENt daTa.

Meyer, Corentin; Romero, Norma Beatriz; Evangelista, Teresinha; Cadot, Brunot; Laporte, Jocelyn; Jeannin-Girardon, Anne; Collet, Pierre; Ayadi, Ali; Chennen, Kirsley; Poch, Olivier.

J Neuromuscul Dis ; 2024 Apr 29.

Artigo em Inglês | MEDLINE | ID: mdl-38701156

RESUMO

Medical acts, such as imaging, lead to the production of various medical text reports that describe the relevant findings. This induces multimodality in patient data by combining image data with free-text and consequently, multimodal data have become central to drive research and improve diagnoses. However, the exploitation of patient data is problematic as the ecosystem of analysis tools is fragmented according to the type of data (images, text, genetics), the task (processing, exploration) and domain of interest (clinical phenotype, histology). To address the challenges, we developed IMPatienT (Integrated digital Multimodal PATIENt daTa), a simple, flexible and open-source web application to digitize, process and explore multimodal patient data. IMPatienT has a modular architecture allowing to: (i) create a standard vocabulary for a domain, (ii) digitize and process free-text data, (iii) annotate images and perform image segmentation, (iv) generate a visualization dashboard and provide diagnosis decision support. To demonstrate the advantages of IMPatienT, we present a use case on a corpus of 40 simulated muscle biopsy reports of congenital myopathy patients. As IMPatienT provides users with the ability to design their own vocabulary, it can be adapted to any research domain and can be used as a patient registry for exploratory data analysis. A demo instance of the application is available at https://impatient.lbgi.fr/.

Spliceator: multi-species splice site prediction using convolutional neural networks.

Scalzitti, Nicolas; Kress, Arnaud; Orhand, Romain; Weber, Thomas; Moulinier, Luc; Jeannin-Girardon, Anne; Collet, Pierre; Poch, Olivier; Thompson, Julie D.

BMC Bioinformatics ; 22(1): 561, 2021 Nov 23.

Artigo em Inglês | MEDLINE | ID: mdl-34814826

RESUMO

BACKGROUND: Ab initio prediction of splice sites is an essential step in eukaryotic genome annotation. Recent predictors have exploited Deep Learning algorithms and reliable gene structures from model organisms. However, Deep Learning methods for non-model organisms are lacking. RESULTS: We developed Spliceator to predict splice sites in a wide range of species, including model and non-model organisms. Spliceator uses a convolutional neural network and is trained on carefully validated data from over 100 organisms. We show that Spliceator achieves consistently high accuracy (89-92%) compared to existing methods on independent benchmarks from human, fish, fly, worm, plant and protist organisms. CONCLUSIONS: Spliceator is a new Deep Learning method trained on high-quality data, which can be used to predict splice sites in diverse organisms, ranging from human to protists, with consistently high accuracy.

Assuntos

Algoritmos , Redes Neurais de Computação , Animais , Genoma , Humanos

Understanding the causes of errors in eukaryotic protein-coding gene prediction: a case study of primate proteomes.

Meyer, Corentin; Scalzitti, Nicolas; Jeannin-Girardon, Anne; Collet, Pierre; Poch, Olivier; Thompson, Julie D.

BMC Bioinformatics ; 21(1): 513, 2020 Nov 10.

Artigo em Inglês | MEDLINE | ID: mdl-33172385

RESUMO

BACKGROUND: Recent advances in sequencing technologies have led to an explosion in the number of genomes available, but accurate genome annotation remains a major challenge. The prediction of protein-coding genes in eukaryotic genomes is especially problematic, due to their complex exon-intron structures. Even the best eukaryotic gene prediction algorithms can make serious errors that will significantly affect subsequent analyses. RESULTS: We first investigated the prevalence of gene prediction errors in a large set of 176,478 proteins from ten primate proteomes available in public databases. Using the well-studied human proteins as a reference, a total of 82,305 potential errors were detected, including 44,001 deletions, 27,289 insertions and 11,015 mismatched segments where part of the correct protein sequence is replaced with an alternative erroneous sequence. We then focused on the mismatched sequence errors that cause particular problems for downstream applications. A detailed characterization allowed us to identify the potential causes for the gene misprediction in approximately half (5446) of these cases. As a proof-of-concept, we also developed a simple method which allowed us to propose improved sequences for 603 primate proteins. CONCLUSIONS: Gene prediction errors in primate proteomes affect up to 50% of the sequences. Major causes of errors include undetermined genome regions, genome sequencing or assembly issues, and limitations in the models used to represent gene exon-intron structures. Nevertheless, existing genome sequences can still be exploited to improve protein sequence quality. Perspectives of the work include the characterization of other types of gene prediction errors, as well as the development of a more comprehensive algorithm for protein sequence error correction.

Assuntos

Fases de Leitura Aberta/genética , Primatas/metabolismo , Proteoma , Sequência de Aminoácidos , Animais , Bases de Dados de Proteínas , Deleção de Genes , Humanos , Mutagênese Insercional , Proteínas Tirosina Fosfatases Semelhantes a Receptores/química , Proteínas Tirosina Fosfatases Semelhantes a Receptores/genética , Proteínas Tirosina Fosfatases Semelhantes a Receptores/metabolismo , Alinhamento de Sequência

A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms.

Scalzitti, Nicolas; Jeannin-Girardon, Anne; Collet, Pierre; Poch, Olivier; Thompson, Julie D.

BMC Genomics ; 21(1): 293, 2020 Apr 09.

Artigo em Inglês | MEDLINE | ID: mdl-32272892

RESUMO

BACKGROUND: The draft genome assemblies produced by new sequencing technologies present important challenges for automatic gene prediction pipelines, leading to less accurate gene models. New benchmark methods are needed to evaluate the accuracy of gene prediction methods in the face of incomplete genome assemblies, low genome coverage and quality, complex gene structures, or a lack of suitable sequences for evidence-based annotations. RESULTS: We describe the construction of a new benchmark, called G3PO (benchmark for Gene and Protein Prediction PrOgrams), designed to represent many of the typical challenges faced by current genome annotation projects. The benchmark is based on a carefully validated and curated set of real eukaryotic genes from 147 phylogenetically disperse organisms, and a number of test sets are defined to evaluate the effects of different features, including genome sequence quality, gene structure complexity, protein length, etc. We used the benchmark to perform an independent comparative analysis of the most widely used ab initio gene prediction programs and identified the main strengths and weaknesses of the programs. More importantly, we highlight a number of features that could be exploited in order to improve the accuracy of current prediction tools. CONCLUSIONS: The experiments showed that ab initio gene structure prediction is a very challenging task, which should be further investigated. We believe that the baseline results associated with the complex gene test sets in G3PO provide useful guidelines for future studies.

Assuntos

Biologia Computacional/métodos , Eucariotos/genética , Anotação de Sequência Molecular/métodos , Animais , Curadoria de Dados , Evolução Molecular , Humanos , Filogenia

Large Scale Tissue Morphogenesis Simulation on Heterogenous Systems Based on a Flexible Biomechanical Cell Model.

Jeannin-Girardon, Anne; Ballet, Pascal; Rodin, Vincent.

IEEE/ACM Trans Comput Biol Bioinform ; 12(5): 1021-33, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26451816

RESUMO

The complexity of biological tissue morphogenesis makes in silico simulations of such system very interesting in order to gain a better understanding of the underlying mechanisms ruling the development of multicellular tissues. This complexity is mainly due to two elements: firstly, biological tissues comprise a large amount of cells; secondly, these cells exhibit complex interactions and behaviors. To address these two issues, we propose two tools: the first one is a virtual cell model that comprise two main elements: firstly, a mechanical structure (membrane, cytoskeleton, and cortex) and secondly, the main behaviors exhibited by biological cells, i.e., mitosis, growth, differentiation, molecule consumption, and production as well as the consideration of the physical constraints issued from the environment. An artificial chemistry is also included in the model. This virtual cell model is coupled to an agent-based formalism. The second tool is a simulator that relies on the OpenCL framework. It allows efficient parallel simulations on heterogenous devices such as micro-processors or graphics processors. We present two case studies validating the implementation of our model in our simulator: cellular proliferation controlled by cell signalling and limb growth in a virtual organism.

Assuntos

Ciclo Celular/fisiologia , Extremidades/anatomia & histologia , Extremidades/crescimento & desenvolvimento , Mecanotransdução Celular/fisiologia , Modelos Biológicos , Morfogênese/fisiologia , Animais , Simulação por Computador , Humanos

A software architecture for multi-cellular system simulations on graphics processing units.

Jeannin-Girardon, Anne; Ballet, Pascal; Rodin, Vincent.

Acta Biotheor ; 61(3): 317-27, 2013 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-23900760

RESUMO

The first aim of simulation in virtual environment is to help biologists to have a better understanding of the simulated system. The cost of such simulation is significantly reduced compared to that of in vivo simulation. However, the inherent complexity of biological system makes it hard to simulate these systems on non-parallel architectures: models might be made of sub-models and take several scales into account; the number of simulated entities may be quite large. Today, graphics cards are used for general purpose computing which has been made easier thanks to frameworks like CUDA or OpenCL. Parallelization of models may however not be easy: parallel computer programing skills are often required; several hardware architectures may be used to execute models. In this paper, we present the software architecture we built in order to implement various models able to simulate multi-cellular system. This architecture is modular and it implements data structures adapted for graphics processing units architectures. It allows efficient simulation of biological mechanisms.

Assuntos

Gráficos por Computador , Modelos Biológicos , Software

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA