Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Interdiscip Sci ; 2024 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-38340264

RESUMO

We report a combined manual annotation and deep-learning natural language processing study to make accurate entity extraction in hereditary disease related biomedical literature. A total of 400 full articles were manually annotated based on published guidelines by experienced genetic interpreters at Beijing Genomics Institute (BGI). The performance of our manual annotations was assessed by comparing our re-annotated results with those publicly available. The overall Jaccard index was calculated to be 0.866 for the four entity types-gene, variant, disease and species. Both a BERT-based large name entity recognition (NER) model and a DistilBERT-based simplified NER model were trained, validated and tested, respectively. Due to the limited manually annotated corpus, Such NER models were fine-tuned with two phases. The F1-scores of BERT-based NER for gene, variant, disease and species are 97.28%, 93.52%, 92.54% and 95.76%, respectively, while those of DistilBERT-based NER are 95.14%, 86.26%, 91.37% and 89.92%, respectively. Most importantly, the entity type of variant has been extracted by a large language model for the first time and a comparable F1-score with the state-of-the-art variant extraction model tmVar has been achieved.

2.
Hum Mutat ; 42(4): 359-372, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33565189

RESUMO

Cancer is one of the most important health issues globally and the accuracy of interpretation of cancer-related variants is critical for the clinical management of hereditary cancer. ClinGen Sequence Variant Interpretation Working Groups have developed many adaptations of American College of Medical Genetics and Genomics and the Association of Molecular Pathologists guidelines to improve the consistency of interpretation. We combined the most recent adaptations to expand the number of the criteria from 28 to 48 and developed a tool called Cancer SIGVAR to help genetic counselors interpret the clinical significance of cancer germline variants. Our tool can accept VCF files as input and realize fully automated interpretation based on 21 criteria and semiautomated interpretation based on 48 criteria. We validated the performance of our tool with the ClinVar and CLINVITAE benchmark databases, achieving an average consistency for pathogenic and benign assessment up to 93.71% and 79.38%, respectively. We compared Cancer SIGVAR with two similar tools, InterVar and PathoMAN, and analyzed the main differences in criteria and implementation. Furthermore, we selected 911 variants from another two in-house benchmark databases, and semiautomated interpretation reached an average classification consistency of 98.35%. Our findings highlight the need to optimize automated interpretation tools based on constantly updated guidelines. Cancer SIGVAR is publicly available at http://cancersigvar.bgi.com/.


Assuntos
Predisposição Genética para Doença , Neoplasias , Testes Genéticos , Variação Genética , Genoma Humano , Células Germinativas , Humanos , Neoplasias/genética , Software , Estados Unidos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...