Pesquisa | Portal Regional da BVS

Structural Variations Affecting Genes and Transposable Elements of Chromosome 3B in Wheats.

De Oliveira, Romain; Rimbert, Hélène; Balfourier, François; Kitt, Jonathan; Dynomant, Emeric; Vrána, Jan; Dolezel, Jaroslav; Cattonaro, Federica; Paux, Etienne; Choulet, Frédéric.

Front Genet ; 11: 891, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33014014

RESUMO

Structural variations (SVs) such as copy number and presence-absence variations are polymorphisms that are known to impact genome composition at the species level and are associated with phenotypic variations. In the absence of a reference genome sequence, their study has long been hampered in wheat. The recent production of new wheat genomic resources has led to a paradigm shift, making possible to investigate the extent of SVs among cultivated and wild accessions. We assessed SVs affecting genes and transposable elements (TEs) in a Triticeae diversity panel of 45 accessions from seven tetraploid and hexaploid species using high-coverage shotgun sequencing of sorted chromosome 3B DNA and dedicated bioinformatics approaches. We showed that 23% of the genes are variable within this panel, and we also identified 330 genes absent from the reference accession Chinese Spring. In addition, 60% of the TE-derived reference markers were absent in at least one accession, revealing a high level of intraspecific and interspecific variability affecting the TE space. Chromosome extremities are the regions where we observed most of the variability, confirming previous hypotheses made when comparing wheat with the other grasses. This study provides deeper insights into the genomic variability affecting the complex Triticeae genomes at the intraspecific and interspecific levels and suggests a phylogeny with independent hybridization events leading to different hexaploid species.

Word Embedding for French Natural Language in Healthcare: A Comparative Study.

Dynomant, Emeric; Lelong, Romain; Dahamna, Badisse; Massonnaud, Clément; Kerdelhué, Gaëtan; Grosjean, Julien; Canu, Stéphane; Darmoni, Stéfan.

Stud Health Technol Inform ; 264: 118-122, 2019 Aug 21.

Artigo em Inglês | MEDLINE | ID: mdl-31437897

RESUMO

Structuring raw medical documents with ontology mapping is now the next step for medical intelligence. Deep learning models take as input mathematically embedded information, such as encoded texts. To do so, word embedding methods can represent every word from a text as a fixed-length vector. A formal evaluation of three word embedding methods has been performed on raw medical documents. The data corresponds to more than 12M diverse documents produced in the Rouen hospital (drug prescriptions, discharge and surgery summaries, inter-services letters, etc.). Automatic and manual validation demonstrates that Word2Vec based on the skip-gram architecture had the best rate on three out of four accuracy tests. This model will now be used as the first layer of an AI-based semantic annotator.

Assuntos

Idioma , Processamento de Linguagem Natural , Aprendizado Profundo , Semântica

Word Embedding for the French Natural Language in Health Care: Comparative Study.

Dynomant, Emeric; Lelong, Romain; Dahamna, Badisse; Massonnaud, Clément; Kerdelhué, Gaétan; Grosjean, Julien; Canu, Stéphane; Darmoni, Stefan J.

JMIR Med Inform ; 7(3): e12310, 2019 Jul 29.

Artigo em Inglês | MEDLINE | ID: mdl-31359873

RESUMO

BACKGROUND: Word embedding technologies, a set of language modeling and feature learning techniques in natural language processing (NLP), are now used in a wide range of applications. However, no formal evaluation and comparison have been made on the ability of each of the 3 current most famous unsupervised implementations (Word2Vec, GloVe, and FastText) to keep track of the semantic similarities existing between words, when trained on the same dataset. OBJECTIVE: The aim of this study was to compare embedding methods trained on a corpus of French health-related documents produced in a professional context. The best method will then help us develop a new semantic annotator. METHODS: Unsupervised embedding models have been trained on 641,279 documents originating from the Rouen University Hospital. These data are not structured and cover a wide range of documents produced in a clinical setting (discharge summary, procedure reports, and prescriptions). In total, 4 rated evaluation tasks were defined (cosine similarity, odd one, analogy-based operations, and human formal evaluation) and applied on each model, as well as embedding visualization. RESULTS: Word2Vec had the highest score on 3 out of 4 rated tasks (analogy-based operations, odd one similarity, and human validation), particularly regarding the skip-gram architecture. CONCLUSIONS: Although this implementation had the best rate for semantic properties conservation, each model has its own qualities and defects, such as the training time, which is very short for GloVe, or morphological similarity conservation observed with FastText. Models and test sets produced by this study will be the first to be publicly available through a graphical interface to help advance the French biomedical research.

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA