Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-38293033

RESUMO

Babesiosis, caused by protozoan parasites of the genus Babesia , is an emerging tick-borne disease of significance for both human and animal health. Babesia parasites infect erythrocytes of vertebrate hosts where they develop and multiply rapidly to cause the pathological symptoms associated with the disease. The identification of various Babesia species underscores the ongoing risk of new zoonotic pathogens capable of infecting humans, a concern amplified by anthropogenic activities and environmental shifts impacting the distribution and transmission dynamics of parasites, their vectors, and reservoir hosts. One such species, Babesia MO1, previously implicated in severe cases of human babesiosis in the midwestern United States, was initially considered closely related to B. divergens , the predominant agent of human babesiosis in Europe. Yet, uncertainties persist regarding whether these pathogens represent distinct variants of the same species or are entirely separate species. We show that although both B. MO1 and B. divergens share similar genome sizes, comprising three nuclear chromosomes, one linear mitochondrial chromosome, and one circular apicoplast chromosome, major differences exist in terms of genomic sequence divergence, gene functions, transcription profiles, replication rates and susceptibility to antiparasitic drugs. Furthermore, both pathogens have evolved distinct classes of multigene families, crucial for their pathogenicity and adaptation to specific mammalian hosts. Leveraging genomic information for B. MO1, B. divergens , and other members of the Babesiidae family within Apicomplexa provides valuable insights into the evolution, diversity, and virulence of these parasites. This knowledge serves as a critical tool in preemptively addressing the emergence and rapid transmission of more virulent strains.

2.
Nucleic Acids Res ; 52(D1): D529-D535, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37843103

RESUMO

To date, the databases built to gather information on gene orthology do not provide end-users with descriptors of the molecular evolution information and phylogenetic pattern of these orthologues. In this context, we developed OrthoMaM, a database of ORTHOlogous MAmmalian Markers describing the evolutionary dynamics of coding sequences in mammalian genomes. OrthoMaM version 12 includes 15,868 alignments of orthologous coding sequences (CDS) from the 190 complete mammalian genomes currently available. All annotations and 1-to-1 orthology assignments are based on NCBI. Orthologous CDS can be mined for potential informative markers at the different taxonomic levels of the mammalian tree. To this end, several evolutionary descriptors of DNA sequences are provided for querying purposes (e.g. base composition and relative substitution rate). The graphical web interface allows the user to easily browse and sort the results of combined queries. The corresponding multiple sequence alignments and ML trees, inferred using state-of-the art approaches, are available for download both at the nucleotide and amino acid levels. OrthoMaM v12 can be used by researchers interested either in reconstructing the phylogenetic relationships of mammalian taxa or in understanding the evolutionary dynamics of coding sequences in their genomes. OrthoMaM is available for browsing, querying and complete or filtered download at https://orthomam.mbb.cnrs.fr/.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Sequência de Bases , Genoma , Genômica/métodos , Mamíferos/classificação , Mamíferos/genética , Filogenia , Evolução Biológica
3.
Ecol Evol ; 12(1): e8555, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35127051

RESUMO

Resurrection studies are a useful tool to measure how phenotypic traits have changed in populations through time. If these trait modifications correlate with the environmental changes that occurred during the time period, it suggests that the phenotypic changes could be a response to selection. Selfing, through its reduction of effective size, could challenge the ability of a population to adapt to environmental changes. Here, we used a resurrection study to test for adaptation in a selfing population of Medicago truncatula, by comparing the genetic composition and flowering times across 22 generations. We found evidence for evolution toward earlier flowering times by about two days and a peculiar genetic structure, typical of highly selfing populations, where some multilocus genotypes (MLGs) are persistent through time. We used the change in frequency of the MLGs through time as a multilocus fitness measure and built a selection gradient that suggests evolution toward earlier flowering times. Yet, a simulation model revealed that the observed change in flowering time could be explained by drift alone, provided the effective size of the population is small enough (<150). These analyses suffer from the difficulty to estimate the effective size in a highly selfing population, where effective recombination is severely reduced.

4.
PLoS One ; 16(8): e0255929, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34370770

RESUMO

Recommender systems aim to provide users with a selection of items, based on predicting their preferences for items they have not yet rated, thus helping them filter out irrelevant ones from a large product catalogue. Collaborative filtering is a widely used mechanism to predict a particular user's interest in a given item, based on feedback from neighbour users with similar tastes. The way the user's neighbourhood is identified has a significant impact on prediction accuracy. Most methods estimate user proximity from ratings they assigned to co-rated items, regardless of their number. This paper introduces a similarity adjustment taking into account the number of co-ratings. The proposed method is based on a concordance ratio representing the probability that two users share the same taste for a new item. The probabilities are further adjusted by using the Empirical Bayes inference method before being used to weight similarities. The proposed approach improves existing similarity measures without increasing time complexity and the adjustment can be combined with all existing similarity measures. Experiments conducted on benchmark datasets confirmed that the proposed method systematically improved the recommender system's prediction accuracy performance for all considered similarity measures.


Assuntos
Algoritmos , Teorema de Bayes
5.
Plant J ; 108(2): 492-508, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34382706

RESUMO

Oryza sativa (rice) plays an essential food security role for more than half of the world's population. Obtaining crops with high levels of disease resistance is a major challenge for breeders, especially today, given the urgent need for agriculture to be more sustainable. Plant resistance genes are mainly encoded by three large leucine-rich repeat (LRR)-containing receptor (LRR-CR) families: the LRR-receptor-like kinase (LRR-RLK), LRR-receptor-like protein (LRR-RLP) and nucleotide-binding LRR receptor (NLR). Using lrrprofiler, a pipeline that we developed to annotate and classify these proteins, we compared three publicly available annotations of the rice Nipponbare reference genome. The extended discrepancies that we observed for LRR-CR gene models led us to perform an in-depth manual curation of their annotations while paying special attention to nonsense mutations. We then transferred this manually curated annotation to Kitaake, a cultivar that is closely related to Nipponbare, using an optimized strategy. Here, we discuss the breakthrough achieved by manual curation when comparing genomes and, in addition to 'functional' and 'structural' annotations, we propose that the community adopts this approach, which we call 'comprehensive' annotation. The resulting data are crucial for further studies on the natural variability and evolution of LRR-CR genes in order to promote their use in breeding future resilient varieties.


Assuntos
Anotação de Sequência Molecular , Oryza/genética , Proteínas de Plantas/genética , Sequências Repetitivas de Aminoácidos , Genoma de Planta , Genótipo , Anotação de Sequência Molecular/métodos , Oryza/química , Proteínas de Plantas/química
6.
Methods Mol Biol ; 2231: 51-70, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33289886

RESUMO

Most genomic and evolutionary comparative analyses rely on accurate multiple sequence alignments. With their underlying codon structure, protein-coding nucleotide sequences pose a specific challenge for multiple sequence alignment. Multiple Alignment of Coding Sequences (MACSE) is a multiple sequence alignment program that provided the first automatic solution for aligning protein-coding gene datasets containing both functional and nonfunctional sequences (pseudogenes). Through its unique features, reliable codon alignments can be built in the presence of frameshifts and stop codons suitable for subsequent analysis of selection based on the ratio of nonsynonymous to synonymous substitutions. Here we offer a practical overview and guidelines on the use of MACSE v2. This major update of the initial algorithm now comes with a graphical interface providing user-friendly access to different subprograms to handle multiple alignments of protein-coding sequences. We also present new pipelines based on MACSE v2 subprograms to handle large datasets and distributed as Singularity containers. MACSE and associated pipelines are available at: https://bioweb.supagro.inra.fr/macse/ .


Assuntos
Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Sequência de Aminoácidos/genética , Animais , Sequência de Bases/genética , Evolução Molecular , Genômica , Filogenia , Pseudogenes
7.
Annu Rev Plant Biol ; 71: 131-156, 2020 04 29.
Artigo em Inglês | MEDLINE | ID: mdl-32186895

RESUMO

Because of their high level of diversity and complex evolutionary histories, most studies on plant receptor-like kinase subfamilies have focused on their kinase domains. With the large amount of genome sequence data available today, particularly on basal land plants and Charophyta, more attention should be paid to primary events that shaped the diversity of the RLK gene family. We thus focus on the motifs and domains found in association with kinase domains to illustrate their origin, organization, and evolutionary dynamics. We discuss when these different domain associations first occurred and how they evolved, based on a literature review complemented by some of our unpublished results.


Assuntos
Proteínas de Plantas , Plantas , Evolução Biológica , Genoma de Planta , Filogenia , Proteínas de Plantas/genética , Plantas/genética , Proteínas Serina-Treonina Quinases
8.
Sci Adv ; 5(5): eaav9188, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-31049399

RESUMO

Cultivated wheats are derived from an intricate history of three genomes, A, B, and D, present in both diploid and polyploid species. It was recently proposed that the D genome originated from an ancient hybridization between the A and B lineages. However, this result has been questioned, and a robust phylogeny of wheat relatives is still lacking. Using transcriptome data from all diploid species and a new methodological approach, our comprehensive phylogenomic analysis revealed that more than half of the species descend from an ancient hybridization event but with a more complex scenario involving a different parent than previously thought-Aegilops mutica, an overlooked wild species-instead of the B genome. We also detected other extensive gene flow events that could explain long-standing controversies in the classification of wheat relatives.


Assuntos
Evolução Molecular , Hibridização Genética , Filogenia , Triticum/genética , Elementos de DNA Transponíveis/genética , DNA Complementar , Diploide , Fluxo Gênico , Genes de Plantas , Genoma de Planta , Polimorfismo Genético , Poliploidia , RNA Mensageiro/isolamento & purificação , Transcriptoma
9.
Mol Biol Evol ; 36(4): 861-862, 2019 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-30698751

RESUMO

We present version 10 of OrthoMaM, a database of orthologous mammalian markers. OrthoMaM is already 11 years old and since the outset it has kept on improving, providing alignments and phylogenetic trees of high-quality computed with state-of-the-art methods on up-to-date data. The main contribution of this version is the increase in the number of taxa: 116 mammalian genomes for 14,509 one-to-one orthologous genes. This has been made possible by the combination of genomic data deposited in Ensembl complemented by additional good-quality genomes only available in NCBI. Version 10 users will benefit from pipeline improvements and a completely redesigned web-interface.


Assuntos
Bases de Dados Genéticas , Genoma , Mamíferos/genética , Filogenia , Alinhamento de Sequência , Animais
10.
PLoS One ; 13(12): e0208838, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30589848

RESUMO

Genetic maps order genetic markers along chromosomes. They are, for instance, extensively used in marker-assisted selection to accelerate breeding programs. Even for the same species, people often have to deal with several alternative maps obtained using different ordering methods or different datasets, e.g. resulting from different segregating populations. Having efficient tools to identify the consistency and discrepancy of alternative maps is thus essential to facilitate genetic map comparisons. We propose to encode genetic maps by bucket order, a kind of order, which takes into account the blurred parts of the marker order while being an efficient data structure to achieve low complexity algorithms. The main result of this paper is an O(n log(n)) procedure to identify the largest agreements between two bucket orders of n elements, their Longest Common Subsequence (LCS), providing an efficient solution to highlight discrepancies between two genetic maps. The LCS of two maps, being the largest set of their collinear markers, is used as a building block to compute pairwise map congruence, to visually emphasize maker collinearity and in some scaffolding methods relying on genetic maps to improve genome assembly. As the LCS computation is a key subroutine of all these genetic map related tools, replacing the current LCS subroutine of those methods by ours -to do the exact same work but faster- could significantly speed up those methods without changing their accuracy. To ease such transition we provide all required algorithmic details in this self contained paper as well as an R package implementing them, named LCSLCIS, which is freely available at: https://github.com/holtzy/LCSLCIS.


Assuntos
Algoritmos , Mapeamento Cromossômico , Modelos Genéticos , Análise de Sequência de DNA/métodos , Marcadores Genéticos
11.
Mol Biol Evol ; 35(10): 2582-2584, 2018 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30165589

RESUMO

Multiple sequence alignment is a prerequisite for many evolutionary analyses. Multiple Alignment of Coding Sequences (MACSE) is a multiple sequence alignment program that explicitly accounts for the underlying codon structure of protein-coding nucleotide sequences. Its unique characteristic allows building reliable codon alignments even in the presence of frameshifts. This facilitates downstream analyses such as selection pressure estimation based on the ratio of nonsynonymous to synonymous substitutions. Here, we present MACSE v2, a major update with an improved version of the initial algorithm enriched with a complete toolkit to handle multiple alignments of protein-coding sequences. A graphical interface now provides user-friendly access to the different subprograms.


Assuntos
Alinhamento de Sequência , Software , Códon de Terminação , Mutação da Fase de Leitura
12.
PLoS One ; 12(9): e0183454, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28886042

RESUMO

Domestication is known to strongly reduce genomic diversity through population bottlenecks. The resulting loss of polymorphism has been thoroughly documented in numerous cultivated species. Here we investigate the impact of domestication on the diversity of alternative transcript expressions using RNAseq data obtained on cultivated and wild sorghum accessions (ten accessions for each pool). In that aim, we focus on genes expressing two isoforms in sorghum and estimate the ratio between expression levels of those isoforms in each accession. Noticeably, for a given gene, one isoform can either be overexpressed or underexpressed in some wild accessions, whereas in the cultivated accessions, the balance between the two isoforms of the same gene appears to be much more homogenous. Indeed, we observe in sorghum significantly more variation in isoform expression balance among wild accessions than among domesticated accessions. The possibility exists that the loss of nucleotide diversity due to domestication could affect regulatory elements, controlling transcription or degradation of these isoforms. Impact on the isoform expression balance is discussed. As far as we know, this is the first time that the impact of domestication on transcript isoform balance has been studied at the genomic scale. This could pave the way towards the identification of key domestication genes with finely tuned isoform expressions in domesticated accessions while being highly variable in their wild relatives.


Assuntos
Processamento Alternativo/genética , Sorghum/genética , Sorghum/metabolismo , Processamento Alternativo/fisiologia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo
13.
J Theor Biol ; 432: 1-13, 2017 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-28801222

RESUMO

Gene trees and species trees can be discordant due to several processes. Standard models of reconciliations consider macro-evolutionary events at the gene level: duplications, losses and transfers of genes. However, another common source of gene tree-species tree discordance is incomplete lineage sorting (ILS), whereby gene divergences corresponding to speciations occur "out of order". However, ILS is seldom considered in reconciliation models. In this paper, we devise a unified formal IDTL reconciliation model which includes all the above mentioned processes. We show how to properly cost ILS under this model, and then give a fixed-parameter tractable (FPT) algorithm which calculates the most parsimonious IDTL reconciliation, with guaranteed time-consistency of transfer events. Provided that the number of branches in contiguous regions of the species tree in which ILS is allowed is bounded by a constant, this algorithm is linear in the number of genes and quadratic in the number of species. This provides a formal foundation to the inference of ILS in a reconciliation framework.


Assuntos
Duplicação Gênica , Transferência Genética Horizontal , Filogenia , Algoritmos , Haploidia , Modelos Genéticos
14.
Theor Appl Genet ; 130(7): 1491-1505, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-28451771

RESUMO

KEY MESSAGE: The resistance of durum wheat to the Wheat spindle streak mosaic virus (WSSMV) is controlled by two main QTLs on chromosomes 7A and 7B, with a huge epistatic effect. Wheat spindle streak mosaic virus (WSSMV) is a major disease of durum wheat in Europe and North America. Breeding WSSMV-resistant cultivars is currently the only way to control the virus since no treatment is available. This paper reports studies of the inheritance of WSSMV resistance using two related durum wheat populations obtained by crossing two elite cultivars with a WSSMV-resistant emmer cultivar. In 2012 and 2015, 354 recombinant inbred lines (RIL) were phenotyped using visual notations, ELISA and qPCR and genotyped using locus targeted capture and sequencing. This allowed us to build a consensus genetic map of 8568 markers and identify three chromosomal regions involved in WSSMV resistance. Two major regions (located on chromosomes 7A and 7B) jointly explain, on the basis of epistatic interactions, up to 43% of the phenotypic variation. Flanking sequences of our genetic markers are provided to facilitate future marker-assisted selection of WSSMV-resistant cultivars.


Assuntos
Resistência à Doença/genética , Epistasia Genética , Doenças das Plantas/genética , Potyviridae , Locos de Características Quantitativas , Triticum/genética , Mapeamento Cromossômico , Cruzamentos Genéticos , Ligação Genética , Marcadores Genéticos , Genótipo , Fenótipo , Doenças das Plantas/virologia , Triticum/virologia
15.
Bioinformatics ; 33(9): 1387-1388, 2017 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-28453680

RESUMO

Motivation: Marker-assisted selection strongly relies on genetic maps to accelerate breeding programs. High-density maps are now available for numerous species. Dedicated tools are required to compare several high-density maps on the basis of their key characteristics, while pinpointing their differences and similarities. Results: We developed the Genetic Map Comparator-a web-based application for easy comparison of different maps according to their key statistics and the relative positions of common markers. Availability and Implementation: The Genetic Map Comparator is available online at: http://bioweb.supagro.inra.fr/geneticMapComparator. The source code is freely available on GitHub under the under the CeCILL general public license: https://github.com/holtzy/GenMap-Comparator. Contact: Holtz@supagro.fr; Ranwez@supagro.fr.


Assuntos
Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Resistência à Doença/genética , Genes de Plantas , Doenças das Plantas/genética , Locos de Características Quantitativas , Triticum/genética , Triticum/virologia , Viroses/genética
16.
PLoS One ; 11(8): e0160043, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27505054

RESUMO

BACKGROUND: Multiple sequence alignment (MSA) is a crucial step in many molecular analyses and many MSA tools have been developed. Most of them use a greedy approach to construct a first alignment that is then refined by optimizing the sum of pair score (SP-score). The SP-score estimation is thus a bottleneck for most MSA tools since it is repeatedly required and is time consuming. RESULTS: Given an alignment of n sequences and L sites, I introduce here optimized solutions reaching O(nL) time complexity for affine gap cost, instead of O(n2L), which are easy to implement.


Assuntos
Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência
17.
PLoS One ; 11(5): e0154609, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27171472

RESUMO

Targeted sequence capture is a promising technology which helps reduce costs for sequencing and genotyping numerous genomic regions in large sets of individuals. Bait sequences are designed to capture specific alleles previously discovered in parents or reference populations. We studied a set of 135 RILs originating from a cross between an emmer cultivar (Dic2) and a recent durum elite cultivar (Silur). Six thousand sequence baits were designed to target Dic2 vs. Silur polymorphisms discovered in a previous RNAseq study. These baits were exposed to genomic DNA of the RIL population. Eighty percent of the targeted SNPs were recovered, 65% of which were of high quality and coverage. The final high density genetic map consisted of more than 3,000 markers, whose genetic and physical mapping were consistent with those obtained with large arrays.


Assuntos
Alelos , Mapeamento Cromossômico , Técnicas de Genotipagem/métodos , Análise de Sequência de DNA/métodos , Triticum/genética , Mapeamento de Sequências Contíguas , Polimorfismo Genético , Polimorfismo de Nucleotídeo Único/genética
18.
J Math Biol ; 72(7): 1811-44, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-26337177

RESUMO

In the field of phylogenetics, the evolutionary history of a set of organisms is commonly depicted by a species tree-whose internal nodes represent speciation events-while the evolutionary history of a gene family is depicted by a gene tree-whose internal nodes can also represent macro-evolutionary events such as gene duplications and transfers. As speciation events are only part of the events shaping a gene history, the topology of a gene tree can show incongruences with that of the corresponding species tree. These incongruences can be used to infer the macro-evolutionary events undergone by the gene family. This is done by embedding the gene tree inside the species tree and hence providing a reconciliation of those trees. In the past decade, several parsimony-based methods have been developed to infer such reconciliations, accounting for gene duplications ([Formula: see text]), transfers ([Formula: see text]) and losses ([Formula: see text]). The main contribution of this paper is to formally prove an important assumption implicitly made by previous works on these reconciliations, namely that solving the (maximum) parsimony [Formula: see text] reconciliation problem in the discrete framework is equivalent to finding a most parsimonious [Formula: see text] scenario in the continuous framework. In the process, we also prove several intermediate results that are useful on their own and constitute a theoretical toolbox that will likely facilitate future theoretical contributions in the field.


Assuntos
Evolução Biológica , Duplicação Gênica , Modelos Biológicos , Algoritmos , Evolução Molecular , Deleção de Genes , Transferência Genética Horizontal , Especiação Genética , Filogenia
19.
BMC Bioinformatics ; 16: 384, 2015 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-26573665

RESUMO

BACKGROUND: Given a gene and a species tree, reconciliation methods attempt to retrieve the macro-evolutionary events that best explain the discrepancies between the two tree topologies. The DTL parsimonious approach searches for a most parsimonious reconciliation between a gene tree and a (dated) species tree, considering four possible macro-evolutionary events (speciation, duplication, transfer, and loss) with specific costs. Unfortunately, many events are erroneously predicted due to errors in the input trees, inappropriate input cost values or because of the existence of several equally parsimonious scenarios. It is thus crucial to provide a measure of the reliability for predicted events. It has been recently proposed that the reliability of an event can be estimated via its frequency in the set of most parsimonious reconciliations obtained using a variety of reasonable input cost vectors. To compute such a support, a straightforward but time-consuming approach is to generate the costs slightly departing from the original ones, independently compute the set of all most parsimonious reconciliations for each vector, and combine these sets a posteriori. Another proposed approach uses Pareto-optimality to partition cost values into regions which induce reconciliations with the same number of DTL events. The support of an event is then defined as its frequency in the set of regions. However, often, the number of regions is not large enough to provide reliable supports. RESULTS: We present here a method to compute efficiently event supports via a polynomial-sized graph, which can represent all reconciliations for several different costs. Moreover, two methods are proposed to take into account alternative input costs: either explicitly providing an input cost range or allowing a tolerance for the over cost of a reconciliation. Our methods are faster than the region based method, substantially faster than the sampling-costs approach, and have a higher event-prediction accuracy on simulated data. CONCLUSIONS: We propose a new approach to improve the accuracy of event supports for parsimonious reconciliation methods to account for uncertainty in the input costs. Furthermore, because of their speed, our methods can be used on large gene families. Our algorithms are implemented in the ecceTERA program, freely available from http://mbb.univ-montp2.fr/MBB/.


Assuntos
Evolução Molecular , Filogenia , Proteobactérias/genética , Algoritmos , Simulação por Computador , Genes Bacterianos , Reprodutibilidade dos Testes
20.
BMC Bioinformatics ; 16: 83, 2015 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-25887746

RESUMO

BACKGROUND: Semantic approaches such as concept-based information retrieval rely on a corpus in which resources are indexed by concepts belonging to a domain ontology. In order to keep such applications up-to-date, new entities need to be frequently annotated to enrich the corpus. However, this task is time-consuming and requires a high-level of expertise in both the domain and the related ontology. Different strategies have thus been proposed to ease this indexing process, each one taking advantage from the features of the document. RESULTS: In this paper we present USI (User-oriented Semantic Indexer), a fast and intuitive method for indexing tasks. We introduce a solution to suggest a conceptual annotation for new entities based on related already indexed documents. Our results, compared to those obtained by previous authors using the MeSH thesaurus and a dataset of biomedical papers, show that the method surpasses text-specific methods in terms of both quality and speed. Evaluations are done via usual metrics and semantic similarity. CONCLUSIONS: By only relying on neighbor documents, the User-oriented Semantic Indexer does not need a representative learning set. Yet, it provides better results than the other approaches by giving a consistent annotation scored with a global criterion - instead of one score per concept.


Assuntos
Indexação e Redação de Resumos , Algoritmos , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Semântica , Interface Usuário-Computador , Humanos , Medical Subject Headings , Reconhecimento Automatizado de Padrão , Vocabulário Controlado
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...