Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Prev Vet Med ; 216: 105932, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37247579

RESUMO

The value of informal sources in increasing the timeliness of disease outbreak detection and providing detailed epidemiological information in the early warning and preparedness context is recognized. This study evaluates machine learning methods for classifying information from animal disease-related news at a fine-grained level (i.e., epidemiological topic). We compare two textual representations, the bag-of-words method and a distributional approach, i.e., word embeddings. Both representations performed well for binary relevance classification (F-measure of 0.839 and 0.871, respectively). Bag-of-words representation was outperformed by word embedding representation for classifying sentences into fine-grained epidemiological topics (F-measure of 0.745). Our results suggest that the word embedding approach is of interest in the context of low-frequency classes in a specialized domain. However, this representation did not bring significant performance improvements for binary relevance classification, indicating that the textual representation should be adapted to each classification task.


Assuntos
Doenças dos Animais , Aprendizado de Máquina , Animais , Doenças dos Animais/epidemiologia
2.
Data Brief ; 46: 108869, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36691558

RESUMO

This study aimed to link experimental data dealing with complex agroecological systems. For sharing and linking collected data with the generic AEGIS (Agro-Ecological Global Information System) database, our work described in this data paper consists in mapping researcher variables to the AEGIS dictionary variable for different tropical crops (sugarcane, rice, sorghum or cover crops). Additionally, this data paper presents a study case based on sugarcane intercropping systems for evaluating 3 matching measures of variables.

3.
Sci Data ; 10(1): 818, 2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-37993460

RESUMO

Land artificialization is a serious problem of civilization. Urban planning and natural risk management are aimed to improve it. In France, these practices operate the Local Land Plans (PLU - Plan Local d'Urbanisme) and the Natural risk prevention plans (PPRn - Plan de Prévention des Risques naturels) containing land use rules. To facilitate automatic extraction of the rules, we manually annotated a number of those documents concerning Montpellier, a rapidly evolving agglomeration exposed to natural risks. We defined a format for labeled examples in which each entry includes title and subtitle. In addition, we proposed a hierarchical representation of class labels to generalize the use of our corpus. Our corpus, consisting of 1934 textual segments, each of which labeled by one of the 4 classes (Verifiable, Non-verifiable, Informative and Not pertinent) is the first corpus in the French language in the fields of urban planning and natural risk management. Along with presenting the corpus, we tested a state-of-the-art approach for text classification to demonstrate its usability for automatic rule extraction.

4.
PLoS One ; 18(9): e0285341, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37669265

RESUMO

Event-Based Surveillance (EBS) tools, such as HealthMap and PADI-web, monitor online news reports and other unofficial sources, with the primary aim to provide timely information to users from health agencies on disease outbreaks occurring worldwide. In this work, we describe how outbreak-related information disseminates from a primary source, via a secondary source, to a definitive aggregator, an EBS tool, during the 2018/19 avian influenza season. We analysed 337 news items from the PADI-web and 115 news articles from HealthMap EBS tools reporting avian influenza outbreaks in birds worldwide between July 2018 and June 2019. We used the sources cited in the news to trace the path of each outbreak. We built a directed network with nodes representing the sources (characterised by type, specialisation, and geographical focus) and edges representing the flow of information. We calculated the degree as a centrality measure to determine the importance of the nodes in information dissemination. We analysed the role of the sources in early detection (detection of an event before its official notification) to the World Organisation for Animal Health (WOAH) and late detection. A total of 23% and 43% of the avian influenza outbreaks detected by the PADI-web and HealthMap, respectively, were shared on time before their notification. For both tools, national and local veterinary authorities were the primary sources of early detection. The early detection component mainly relied on the dissemination of nationally acknowledged events by online news and press agencies, bypassing international reporting to the WAOH. WOAH was the major secondary source for late detection, occupying a central position between national authorities and disseminator sources, such as online news. PADI-web and HealthMap were highly complementary in terms of detected sources, explaining why 90% of the events were detected by only one of the tools. We show that current EBS tools can provide timely outbreak-related information and priority news sources to improve digital disease surveillance.


Assuntos
Influenza Aviária , Animais , Surtos de Doenças , Geografia , Processos Grupais , Disseminação de Informação
5.
Data Brief ; 46: 108870, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36687146

RESUMO

This paper presents an annotated dataset used in the MOOD Antimicrobial Resistance (AMR) hackathon, hosted in Montpellier, June 2022. The collected data concerns unstructured data from news items, scientific publications and national or international reports, collected from four event-based surveillance (EBS) Systems, i.e. ProMED, PADI-web, HealthMap and MedISys. Data was annotated by relevance for epidemic intelligence (EI) purposes with the help of AMR experts and an annotation guideline. Extracted data were intended to include relevant events on the emergence and spread of AMR such as reports on AMR trends, discovery of new drug-bug resistances, or new AMR genes in human, animal or environmental reservoirs. This dataset can be used to train or evaluate classification approaches to automatically identify written text on AMR events across the different reservoirs and sectors of One Health (i.e. human, animal, food, environmental sources, such as soil and waste water) in unstructured data (e.g. news, tweets) and classify these events by relevance for EI purposes.

6.
Data Brief ; 45: 108680, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36425989

RESUMO

The main objective of the project LEAP4FNSSA (Long-term EU-AU Research and Innovation Partnership for Food and Nutrition Security and Sustainable Agriculture) is to provide a tool for European and African institutions to engage in a sustainable partnership platform for research and innovation on Food and Nutrition Security, and Sustainable Agriculture (FNSSA). The FNSSA roadmap facilitates the involvement of stakeholders for addressing and linking research to innovation dealing with food security issues. In this context, the LEAP4FNSSA project supports the driving of the roadmap. Research and innovation activities were captured in different data, i.e. LEAP4FNSSA database and heterogeneous textual data including project reports, websites, scientific publications, workshop reports and student theses. The Knowledge Extractor Pipeline System (KEOPS) was implemented to support the processing and analysis of textual data associated with FNSSA activities. KEOPS is based on the LEAP4FNSSA lexicon presented in this data paper. The LEAP4FNSSA lexicon composed of 331 keywords associated with 12 concepts of the food security domain is the result of 3 steps of work and brainstorming. The lexicon enables the capturing of research and innovation topics dealing with food security and conducted by African and European partners. This data paper presents the obtained lexicon and a summary of the method to build it.

7.
Data Brief ; 43: 108317, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35692611

RESUMO

This dataset is composed by spatial (e.g. location) and thematic (e.g. diseases, symptoms, virus) entities concerning avian influenza in social media (textual) data in English. It was created from three corpora: the first one includes 10 transcriptions of YouTube videos and 70 tweets manually annotated. The second corpus is composed by the same textual data but automatically annotated with Named Entity Recognition (NER) tools. These two corpora have been built to evaluate NER tools and apply them to a bigger corpus. The third corpus is composed of 100 YouTube transcriptions automatically annotated with NER tools. The aim of the annotation task is to recognize spatial information such as the names of the cities and epidemiological information such as the names of the diseases. An annotation guideline is provided in order to ensure a unified annotation and to help the annotators. This dataset can be used to train or evaluate Natural Language Processing (NLP) approaches such as specialized entity recognition.

8.
Data Brief ; 41: 108000, 2022 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-35295868

RESUMO

This dataset is dedicated to text mining and is composed of partial n-Ary relation instances concerning food packaging composition and gas permeability. It was created from 31 tables derived from 10 English-language scientific articles in html format from several international journals hosted on the ScienceDirect website. This dataset includes two sets of data: manual table annotation results and automatic data extraction results. The tables were first annotated by one annotator and cross-curated by three different annotators. The annotation task aimed to identify all table data dealing with packaging permeability measurements and compositions. An Ontological and Terminological Resource (OTR) was used for the annotation process. The annotation guidelines were drawn up through a collective iterative approach involving the annotators, and they may be accessed alongside the data. This dataset of n-Ary relations can be used in natural language processing (NLP) approaches implemented in experimental fields, especially for n-Ary relation extraction research. It can also be useful for training or evaluation of methods for the extraction of experimental data from tables and text in scientific documents, especially in experimental domains such as food packaging.

9.
Sci Data ; 9(1): 655, 2022 10 26.
Artigo em Inglês | MEDLINE | ID: mdl-36289243

RESUMO

Event-based surveillance (EBS) gathers information from a variety of data sources, including online news articles. Unlike the data from formal reporting, the EBS data are not structured, and their interpretation can overwhelm epidemic intelligence (EI) capacities in terms of available human resources. Therefore, diverse EBS systems that automatically process (all or part of) the acquired nonstructured data from online news articles have been developed. These EBS systems (e.g., GPHIN, HealthMap, MedISys, ProMED, PADI-web) can use annotated data to improve the surveillance systems. This paper describes a framework for the annotation of epidemiological information in animal disease-related news articles. We provide annotation guidelines that are generic and applicable to both animal and zoonotic infectious diseases, regardless of the pathogen involved or its mode of transmission (e.g., vector-borne, airborne, by contact). The framework relies on the successive annotation of all the sentences from a news article. The annotator evaluates the sentences in a specific epidemiological context, corresponding to the publication date of the news article.

10.
Data Brief ; 36: 107135, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34041321

RESUMO

This dataset is composed of symbolic and quantitative entities concerning food packaging composition and gas permeability. It was created from 50 scientific articles in English registered in html format from several international journals on the ScienceDirect website. The files were annotated independently by three experts on a WebAnno server. The aim of the annotation task was to recognize all entities related to packaging permeability measures and packaging composition. This annotation task is driven by an Ontological and Terminological Resource (OTR). An annotation guideline was designed in a collective and iterative approach involving the annotators. This dataset can be used to train or evaluate natural language processing (NLP) approaches in experimental fields, such as specialized entity recognition (e.g. terms and variations, units of measure, complex numerical values) or sentence level binary relation (e.g. value to unit, term to acronym).

11.
Transbound Emerg Dis ; 68(3): 981-986, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-32683774

RESUMO

Event-based surveillance (EBS) systems monitor a broad range of information sources to detect early signals of disease emergence, including new and unknown diseases. In December 2019, a newly identified coronavirus emerged in Wuhan (China), causing a global coronavirus disease (COVID-19) pandemic. A retrospective study was conducted to evaluate the capacity of three event-based surveillance (EBS) systems (ProMED, HealthMap and PADI-web) to detect early COVID-19 emergence signals. We focused on changes in online news vocabulary over the period before/after the identification of COVID-19, while also assessing its contagiousness and pandemic potential. ProMED was the timeliest EBS, detecting signals one day before the official notification. At this early stage, the specific vocabulary used was related to 'pneumonia symptoms' and 'mystery illness'. Once COVID-19 was identified, the vocabulary changed to virus family and specific COVID-19 acronyms. Our results suggest that the three EBS systems are complementary regarding data sources, and all require timeliness improvements. EBS methods should be adapted to the different stages of disease emergence to enhance early detection of future unknown disease outbreaks.


Assuntos
COVID-19/diagnóstico , COVID-19/epidemiologia , Doenças Transmissíveis Emergentes/diagnóstico , Doenças Transmissíveis Emergentes/epidemiologia , SARS-CoV-2 , Animais , China/epidemiologia , Humanos , Vigilância da População , Estudos Retrospectivos
12.
Health Inf Sci Syst ; 9(1): 29, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34276970

RESUMO

Here, we introduce ITEXT-BIO, an intelligent process for biomedical domain terminology extraction from textual documents and subsequent analysis. The proposed methodology consists of two complementary approaches, including free and driven term extraction. The first is based on term extraction with statistical measures, while the second considers morphosyntactic variation rules to extract term variants from the corpus. The combination of two term extraction and analysis strategies is the keystone of ITEXT-BIO. These include combined intra-corpus strategies that enable term extraction and analysis either from a single corpus (intra), or from corpora (inter). We assessed the two approaches, the corpus or corpora to be analysed and the type of statistical measures used. Our experimental findings revealed that the proposed methodology could be used: (1) to efficiently extract representative, discriminant and new terms from a given corpus or corpora, and (2) to provide quantitative and qualitative analyses on these terms regarding the study domain.

13.
One Health ; 13: 100357, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34950760

RESUMO

PADI-web (Platform for Automated extraction of animal Disease Information from the web) is a biosurveillance system dedicated to monitoring online news sources for the detection of emerging animal infectious diseases. PADI-web has collected more than 380,000 news articles since 2016. Compared to other existing biosurveillance tools, PADI-web focuses specifically on animal health and has a fully automated pipeline based on machine-learning methods. This paper presents the new functionalities of PADI-web based on the integration of: (i) a new fine-grained classification system, (ii) automatic methods to extract terms and named entities with text-mining approaches, (iii) semantic resources for indexing keywords and (iv) a notification system for end-users. Compared to other biosurveillance tools, PADI-web, which is integrated in the French Platform for Animal Health Surveillance (ESA Platform), offers strong coverage of the animal sector, a multilingual approach, an automated information extraction module and a notification tool configurable according to end-user needs.

14.
Stud Health Technol Inform ; 160(Pt 2): 1314-8, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20841897

RESUMO

UNLABELLED: Analyzing microarrays data is still a great challenge since existing methods produce huge amounts of useless results. We propose a new method called NoDisco for discovering novelties in gene sequences obtained by applying data-mining techniques to microarray data. METHOD: We identify popular genes, which are often cited in the literature, and innovative genes, which are linked to the popular genes in the sequences but are not mentioned in the literature. We also identify popular and innovative sequences containing these genes. Biologists can thus select interesting sequences from the two sets and obtain the k-best documents. RESULTS: We show the efficiency of this method by applying it on real data used to decipher the mechanisms underlying Alzheimer disease. CONCLUSION: The first selection of sequences based on popularity and innovation help experts focus on relevant sequences while the top-k documents help them understand the sequences.


Assuntos
Doença de Alzheimer/genética , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Mineração de Dados/métodos , Humanos
15.
Data Brief ; 33: 106356, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33015251

RESUMO

The vocabulary used in news on a disease such as COVID-19 changes according the period [4]. This aspect is discussed on the basis of MEDISYS-sourced media datasets via two studies. The first focuses on terminology extraction and the second on period prediction according to the textual content using machine learning approaches.

16.
Stud Health Technol Inform ; 150: 767-71, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19745414

RESUMO

Transcriptomic technologies are promising tools for identifying new genes involved in cerebral ageing or in neurodegenerative diseases such as Alzheimer's disease. These technologies produce massive biological data, which so far are extremely difficult to exploit. In this context, we propose GeneMining, a multidisciplinary methodology, which aims at developing new strategies to analyse such data, and to design interactive tools to help biologists to identify, visualize and interpret brain ageing signatures. In order to address the specific problem of brain ageing signatures discovery, we combine and apply existing tools with emphasis to a new efficient data mining method based on sequential patterns.


Assuntos
Envelhecimento/genética , Encéfalo/fisiologia , Perfilação da Expressão Gênica , Sequência de Bases , Biologia Computacional , Genômica , Humanos
17.
Data Brief ; 22: 643-646, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-30671512

RESUMO

Monitoring animal health worldwide, especially the early detection of outbreaks of emerging pathogens, is one of the means of preventing the introduction of infectious diseases in countries (Collier et al., 2008) [3]. In this context, we developed PADI-web, a Platform for Automated extraction of animal Disease Information from the Web (Arsevska et al., 2016, 2018). PADI-web is a text-mining tool that automatically detects, categorizes and extracts disease outbreak information from Web news articles. PADI-web currently monitors the Web for five emerging animal infectious diseases, i.e., African swine fever, avian influenza including highly pathogenic and low pathogenic avian influenza, foot-and-mouth disease, bluetongue, and Schmallenberg virus infection. PADI-web collects Web news articles in near-real time through RSS feeds. Currently, PADI-web collects disease information from Google News because of its international and multiple language coverage. We implemented machine learning techniques to identify the relevant disease information in texts (i.e., location and date of an outbreak, affected hosts, their numbers and clinical signs). In order to train the model for Information Extraction (IE) from news articles, a corpus in English has been manually labeled by domain experts. This labeled corpus (Rabatel et al., 2017) is presented in this data paper.

18.
Front Public Health ; 7: 138, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31263687

RESUMO

The economic evaluation of health surveillance systems and of health information is a methodological challenge, as for information systems in general. Main present threads are considering cost-effectiveness solutions, minimizing costs for a given technically required output, or cost-benefit analysis, balancing costs with economic benefits of duly informed public interventions. The latter option, following a linear command-and-control perspective, implies considering a main causal link between information, decision, action, and health benefits. Yet, valuing information, taking into account its nature and multiple sources, the modalities of its processing cycle, from production to diffusion, decentralized use and gradual building of a shared information capital, constitutes a promising challenge. This work proposes an interdisciplinary insight on the value of health surveillance to get a renewed theoretical framework integrating information and informatics theory and information economics. The reflection is based on a typological approach of value, basically distinguishing between use and non-use values. Through this structured discussion, the main idea is to expand the boundaries of surveillance evaluation, to focus on changes and trends, on the dynamic and networked structure of information systems, on the contribution of diverse data, and on the added value of combining qualitative and quantitative information. Distancing itself from the command-and-control model, this reflection considers the behavioral fundaments of many health risks, as well as the decentralized, progressive and deliberative dimension of decision-making in risk management. The framework also draws on lessons learnt from recent applications within and outside of health sector, as in surveillance of antimicrobial resistance, inter-laboratory networks, the use of big data or web sources, the diffusion of technological products and large-scale financial risks. Finally, the paper poses the bases to think the challenge of a workable approach to economic evaluation of health surveillance through a better understanding of health information value. It aims to avoid over-simplifying the range of health information benefits across society while keeping evaluation within the boundaries of what may be ascribed to the assessed information system.

19.
PLoS One ; 13(8): e0199960, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30074992

RESUMO

Since 2013, the French Animal Health Epidemic Intelligence System (in French: Veille Sanitaire Internationale, VSI) has been monitoring signals of the emergence of new and exotic animal infectious diseases worldwide. Once detected, the VSI team verifies the signals and issues early warning reports to French animal health authorities when potential threats to France are detected. To improve detection of signals from online news sources, we designed the Platform for Automated extraction of Disease Information from the web (PADI-web). PADI-web automatically collects, processes and extracts English-language epidemiological information from Google News. The core component of PADI-web is a combined information extraction (IE) method founded on rule-based systems and data mining techniques. The IE approach allows extraction of key information on diseases, locations, dates, hosts and the number of cases mentioned in the news. We evaluated the combined method for IE on a dataset of 352 disease-related news reports mentioning the diseases involved, locations, dates, hosts and the number of cases. The combined method for IE accurately identified (F-score) 95% of the diseases and hosts, respectively, 85% of the number of cases, 83% of dates and 80% of locations from the disease-related news. We assessed the sensitivity of PADI-web to detect primary outbreaks of four emerging animal infectious diseases notifiable to the World Organisation for Animal Health (OIE). From January to June 2016, PADI-web detected signals for 64% of all primary outbreaks of African swine fever, 53% of avian influenza, 25% of bluetongue and 19% of foot-and-mouth disease. PADI-web timely detected primary outbreaks of avian influenza and foot-and-mouth disease in Asia, i.e. they were detected 8 and 3 days before immediate notification to OIE, respectively.


Assuntos
Doenças Transmissíveis Emergentes/veterinária , Monitoramento Epidemiológico/veterinária , Internet , Animais , Doenças Transmissíveis Emergentes/epidemiologia , Mineração de Dados , França/epidemiologia , Meios de Comunicação de Massa , Reconhecimento Automatizado de Padrão , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA