Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
Artif Intell Med ; 89: 1-9, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29754799

RESUMO

OBJECTIVE: Death certificates are an invaluable source of cancer mortality statistics. However, this value can only be realised if accurate, quantitative data can be extracted from certificates-an aim hampered by both the volume and variable quality of certificates written in natural language. This paper proposes an automatic classification system for identifying all cancer related causes of death from death certificates. METHODS: Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. The features were used as input to two different classification sub-systems: a machine learning sub-system using Support Vector Machines (SVMs) and a rule-based sub-system. A fusion sub-system then combines the results from SVMs and rules into a single final classification. A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. RESULTS: The system was highly effective at determining the type of cancers for both common cancers (F-measure of 0.85) and rare cancers (F-measure of 0.7). In general, rules performed superior to SVMs; however, the fusion method that combined the two was the most effective. CONCLUSION: The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.


Assuntos
Mineração de Dados/métodos , Atestado de Óbito , Processamento de Linguagem Natural , Neoplasias/mortalidade , Doenças Raras/mortalidade , Máquina de Vetores de Suporte , Causas de Morte , Confiabilidade dos Dados , Bases de Dados Factuais , Humanos , New South Wales/epidemiologia , Sistema de Registros , Reprodutibilidade dos Testes
2.
Int J Med Inform ; 84(11): 956-65, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26323193

RESUMO

OBJECTIVE: Death certificates provide an invaluable source for cancer mortality statistics; however, this value can only be realised if accurate, quantitative data can be extracted from certificates--an aim hampered by both the volume and variable nature of certificates written in natural language. This paper proposes an automatic classification system for identifying cancer related causes of death from death certificates. METHODS: Detailed features, including terms, n-grams and SNOMED CT concepts were extracted from a collection of 447,336 death certificates. These features were used to train Support Vector Machine classifiers (one classifier for each cancer type). The classifiers were deployed in a cascaded architecture: the first level identified the presence of cancer (i.e., binary cancer/nocancer) and the second level identified the type of cancer (according to the ICD-10 classification system). A held-out test set was used to evaluate the effectiveness of the classifiers according to precision, recall and F-measure. In addition, detailed feature analysis was performed to reveal the characteristics of a successful cancer classification model. RESULTS: The system was highly effective at identifying cancer as the underlying cause of death (F-measure 0.94). The system was also effective at determining the type of cancer for common cancers (F-measure 0.7). Rare cancers, for which there was little training data, were difficult to classify accurately (F-measure 0.12). Factors influencing performance were the amount of training data and certain ambiguous cancers (e.g., those in the stomach region). The feature analysis revealed a combination of features were important for cancer type classification, with SNOMED CT concept and oncology specific morphology features proving the most valuable. CONCLUSION: The system proposed in this study provides automatic identification and characterisation of cancers from large collections of free-text death certificates. This allows organisations such as Cancer Registries to monitor and report on cancer mortality in a timely and accurate manner. In addition, the methods and findings are generally applicable beyond cancer classification and to other sources of medical text besides death certificates.


Assuntos
Atestado de Óbito , Aprendizado de Máquina , Processamento de Linguagem Natural , Neoplasias/classificação , Neoplasias/mortalidade , Causas de Morte , Humanos , Classificação Internacional de Doenças , Aprendizado de Máquina/normas , New South Wales/epidemiologia , Avaliação de Programas e Projetos de Saúde , Sistema de Registros
3.
Australas Med J ; 6(5): 292-9, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23745151

RESUMO

BACKGROUND: Cancer monitoring and prevention relies on the critical aspect of timely notification of cancer cases. However, the abstraction and classification of cancer from the free-text of pathology reports and other relevant documents, such as death certificates, exist as complex and time-consuming activities. AIMS: In this paper, approaches for the automatic detection of notifiable cancer cases as the cause of death from free-text death certificates supplied to Cancer Registries are investigated. METHOD: A number of machine learning classifiers were studied. Features were extracted using natural language techniques and the Medtex toolkit. The numerous features encompassed stemmed words, bi-grams, and concepts from the SNOMED CT medical terminology. The baseline consisted of a keyword spotter using keywords extracted from the long description of ICD-10 cancer related codes. RESULTS: Death certificates with notifiable cancer listed as the cause of death can be effectively identified with the methods studied in this paper. A Support Vector Machine (SVM) classifier achieved best performance with an overall Fmeasure of 0.9866 when evaluated on a set of 5,000 freetext death certificates using the token stem feature set. The SNOMED CT concept plus token stem feature set reached the lowest variance (0.0032) and false negative rate (0.0297) while achieving an F-measure of 0.9864. The SVM classifier accounts for the first 18 of the top 40 evaluated runs, and entails the most robust classifier with a variance of 0.001141, half the variance of the other classifiers. CONCLUSION: The selection of features significantly produced the most influences on the performance of the classifiers, although the type of classifier employed also affects performance. In contrast, the feature weighting schema created a negligible effect on performance. Specifically, it is found that stemmed tokens with or without SNOMED CT concepts create the most effective feature when combined with an SVM classifier.

4.
Stud Health Technol Inform ; 178: 250-6, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22797049

RESUMO

OBJECTIVE: To evaluate the effects of Optical Character Recognition (OCR) on the automatic cancer classification of pathology reports. METHOD: Scanned images of pathology reports were converted to electronic free-text using a commercial OCR system. A state-of-the-art cancer classification system, the Medical Text Extraction (MEDTEX) system, was used to automatically classify the OCR reports. Classifications produced by MEDTEX on the OCR versions of the reports were compared with the classification from a human amended version of the OCR reports. RESULTS: The employed OCR system was found to recognise scanned pathology reports with up to 99.12% character accuracy and up to 98.95% word accuracy. Errors in the OCR processing were found to minimally impact on the automatic classification of scanned pathology reports into notifiable groups. However, the impact of OCR errors is not negligible when considering the extraction of cancer notification items, such as primary site, histological type, etc. CONCLUSIONS: The automatic cancer classification system used in this work, MEDTEX, has proven to be robust to errors produced by the acquisition of freetext pathology reports from scanned images through OCR software. However, issues emerge when considering the extraction of cancer notification items.


Assuntos
Processos de Cópia/normas , Prontuários Médicos , Neoplasias/patologia , Patologia Clínica , Patologia/classificação , Automação , Humanos , Processamento de Linguagem Natural
5.
Eur J Hum Genet ; 10(1): 17-25, 2002 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-11896452

RESUMO

Keratolytic winter erythema is an autosomal dominant skin disorder characterised by erythema, hyperkeratosis, and peeling of the skin of the palms and soles, especially during winter. The keratolytic winter erythema locus has been mapped to human chromosome 8p22-p23. This chromosomal region has also been associated with frequent loss of heterozygosity in different types of cancer. To identify positional candidate genes for keratolytic winter erythema, a BAC contig located between the markers at D8S550 and D8S1695 was constructed and sequenced. It could be extended to D8S1759 by a partially sequenced BAC clone identified by database searches. In the 634 404 bp contig 13 new polymorphic microsatellite loci and 46 single nucleotide and insertion/deletion polymorphisms were identified. Twelve transcripts were identified between D8S550 and D8S1759 by exon trapping, cDNA selection, and sequence analyses. They were localised on the genomic sequence, their exon/intron structure was determined, and their expression analysed by RT-PCR. Only one of the transcripts corresponds to a known gene, encoding B-lymphocyte specific tyrosine kinase, BLK. A putative novel myotubularin-related protein gene (MTMR8), a potential human homologue of the mouse acyl-malonyl condensing enzyme gene (Amac1), and two transcripts showing similarities to the mouse L-threonine 3-dehydrogenase gene and the human SEC oncogene, respectively, were identified. The remaining seven transcripts did not show similarities to known genes. There were no potentially pathogenic mutations identified in any of these transcripts in keratolytic winter erythema patients.


Assuntos
Cromossomos Humanos Par 8 , Eritema/genética , Dermatopatias Genéticas/genética , Cromossomos Artificiais Bacterianos , Mapeamento de Sequências Contíguas , DNA Complementar , Eritema/patologia , Humanos , Ceratose/genética , Ceratose/patologia , Mutação , RNA Mensageiro/metabolismo , Estações do Ano , Análise de Sequência de DNA , Dermatopatias Genéticas/patologia , Transcrição Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA