Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 765
Filtrar
1.
Stud Health Technol Inform ; 285: 94-99, 2021 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-34734857

RESUMO

Electronic Medical Records (EMR) contain a lot of valuable data about patients, which is however unstructured. There is a lack of labeled medical text data in Russian and there are no tools for automatic annotation. We present an unsupervised approach to medical data annotation. Morphological and syntactical analyses of initial sentences produce syntactic trees, from which similar subtrees are then grouped by Word2Vec and labeled using dictionaries and Wikidata categories. This method can be used to automatically label EMRs in Russian and proposed methodology can be applied to other languages, which lack resources for automatic labeling and domain vocabularies.


Assuntos
Registros Eletrônicos de Saúde , Semântica , Curadoria de Dados , Humanos , Idioma , Processamento de Linguagem Natural
2.
Tidsskr Nor Laegeforen ; 1412021 09 28.
Artigo em Norueguês | MEDLINE | ID: mdl-34596998
3.
PLoS One ; 16(10): e0258693, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34648558

RESUMO

Information theoretic approaches are ubiquitous and effective in a wide variety of bioinformatics applications. In comparative genomics, alignment-free methods, based on short DNA words, or k-mers, are particularly powerful. We evaluated the utility of varying k-mer lengths for genome comparisons by analyzing their sequence space coverage of 5805 genomes in the KEGG GENOME database. In subsequent analyses on four k-mer lengths spanning the relevant range (11, 21, 31, 41), hierarchical clustering of 1634 genus-level representative genomes using pairwise 21- and 31-mer Jaccard similarities best recapitulated a phylogenetic/taxonomic tree of life with clear boundaries for superkingdom domains and high subtree similarity for named taxons at lower levels (family through phylum). By analyzing ~14.2M prokaryotic genome comparisons by their lowest-common-ancestor taxon levels, we detected many potential misclassification errors in a curated database, further demonstrating the need for wide-scale adoption of quantitative taxonomic classifications based on whole-genome similarity.


Assuntos
Archaea/classificação , Bactérias/classificação , Biologia Computacional/métodos , Archaea/genética , Bactérias/genética , Curadoria de Dados , Genoma Arqueal , Genoma Bacteriano , Genômica , Filogenia , Análise de Sequência de DNA
4.
Water Res ; 206: 117695, 2021 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-34626884

RESUMO

Anomaly detection is the process of identifying unexpected data samples in datasets. Automated anomaly detection is either performed using supervised machine learning models, which require a labelled dataset for their calibration, or unsupervised models, which do not require labels. While academic research has produced a vast array of tools and machine learning models for automated anomaly detection, the research community focused on environmental systems still lacks a comparative analysis that is simultaneously comprehensive, objective, and systematic. This knowledge gap is addressed for the first time in this study, where 15 different supervised and unsupervised anomaly detection models are evaluated on 5 different environmental datasets from engineered and natural aquatic systems. To this end, anomaly detection performance, labelling efforts, as well as the impact of model and algorithm tuning are taken into account. As a result, our analysis reveals the relative strengths and weaknesses of the different approaches in an objective manner without bias for any particular paradigm in machine learning. Most importantly, our results show that expert-based data annotation is extremely valuable for anomaly detection based on machine learning.


Assuntos
Curadoria de Dados , Aprendizado de Máquina , Algoritmos , Humanos
5.
Int J Mol Sci ; 22(17)2021 Sep 06.
Artigo em Inglês | MEDLINE | ID: mdl-34502531

RESUMO

Interactions between proteins are essential to any cellular process and constitute the basis for molecular networks that determine the functional state of a cell. With the technical advances in recent years, an astonishingly high number of protein-protein interactions has been revealed. However, the interactome of O-linked N-acetylglucosamine transferase (OGT), the sole enzyme adding the O-linked ß-N-acetylglucosamine (O-GlcNAc) onto its target proteins, has been largely undefined. To that end, we collated OGT interaction proteins experimentally identified in the past several decades. Rigorous curation of datasets from public repositories and O-GlcNAc-focused publications led to the identification of up to 929 high-stringency OGT interactors from multiple species studied (including Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Arabidopsis thaliana, and others). Among them, 784 human proteins were found to be interactors of human OGT. Moreover, these proteins spanned a very diverse range of functional classes (e.g., DNA repair, RNA metabolism, translational regulation, and cell cycle), with significant enrichment in regulating transcription and (co)translation. Our dataset demonstrates that OGT is likely a hub protein in cells. A webserver OGT-Protein Interaction Network (OGT-PIN) has also been created, which is freely accessible.


Assuntos
Acetilglucosamina/metabolismo , Curadoria de Dados/métodos , Bases de Dados de Proteínas/estatística & dados numéricos , N-Acetilglucosaminiltransferases/metabolismo , Mapas de Interação de Proteínas , Processamento de Proteína Pós-Traducional , Animais , Proteínas de Arabidopsis/metabolismo , Proteínas de Drosophila/metabolismo , Humanos , Camundongos , Ratos
6.
AMIA Jt Summits Transl Sci Proc ; 2021: 325-334, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34457147

RESUMO

Rare diseases affect between 25 and 30 million people in the United States, and understanding their epidemiology is critical to focusing research efforts. However, little is known about the prevalence of many rare diseases. Given a lack of automated tools, current methods to identify and collect epidemiological data are managed through manual curation. To accelerate this process systematically, we developed a novel predictive model to programmatically identify epidemiologic studies on rare diseases from PubMed. A long short-term memory recurrent neural network was developed to predict whether a PubMed abstract represents an epidemiologic study. Our model performed well on our validation set (precision = 0.846, recall = 0.937, AUC = 0.967), and obtained satisfying results on the test set. This model thus shows promise to accelerate the pace of epidemiologic data curation in rare diseases and could be extended for use in other types of studies and in other disease domains.


Assuntos
Redes Neurais de Computação , Doenças Raras , Curadoria de Dados , Estudos Epidemiológicos , Humanos , PubMed , Doenças Raras/diagnóstico , Doenças Raras/epidemiologia , Estados Unidos
7.
Am J Hum Genet ; 108(9): 1551-1557, 2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-34329581

RESUMO

Clinical validity assessments of gene-disease associations underpin analysis and reporting in diagnostic genomics, and yet wide variability exists in practice, particularly in use of these assessments for virtual gene panel design and maintenance. Harmonization efforts are hampered by the lack of agreed terminology, agreed gene curation standards, and platforms that can be used to identify and resolve discrepancies at scale. We undertook a systematic comparison of the content of 80 virtual gene panels used in two healthcare systems by multiple diagnostic providers in the United Kingdom and Australia. The process was enabled by a shared curation platform, PanelApp, and resulted in the identification and review of 2,144 discordant gene ratings, demonstrating the utility of sharing structured gene-disease validity assessments and collaborative discordance resolution in establishing national and international consensus.


Assuntos
Consenso , Curadoria de Dados/normas , Doenças Genéticas Inatas/genética , Genômica/normas , Anotação de Sequência Molecular/normas , Austrália , Biomarcadores/metabolismo , Curadoria de Dados/métodos , Atenção à Saúde , Expressão Gênica , Ontologia Genética , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/patologia , Genômica/métodos , Humanos , Aplicativos Móveis/provisão & distribuição , Terminologia como Assunto , Reino Unido
8.
PLoS One ; 16(7): e0254764, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34324540

RESUMO

BACKGROUND: As healthcare-related data proliferate, there is need to annotate them expertly for the purposes of personalized medicine. Crowdworking is an alternative to expensive expert labour. Annotation corresponds to diagnosis, so comparing unlabeled records to labeled ones seems more appropriate for crowdworkers without medical expertise. We modeled the comparison of a record to two other records as a triplet annotation task, and we conducted an experiment to investigate to what extend sensor-measured stress, task duration, uncertainty of the annotators and agreement among the annotators could predict annotation correctness. MATERIALS AND METHODS: We conducted an annotation experiment on health data from a population-based study. The triplet annotation task was to decide whether an individual was more similar to a healthy one or to one with a given disorder. We used hepatic steatosis as example disorder, and described the individuals with 10 pre-selected characteristics related to this disorder. We recorded task duration, electro-dermal activity as stress indicator, and uncertainty as stated by the experiment participants (n = 29 non-experts and three experts) for 30 triplets. We built an Artificial Similarity-Based Annotator (ASBA) and compared its correctness and uncertainty to that of the experiment participants. RESULTS: We found no correlation between correctness and either of stated uncertainty, stress and task duration. Annotator agreement has not been predictive either. Notably, for some tasks, annotators agreed unanimously on an incorrect annotation. When controlling for Triplet ID, we identified significant correlations, indicating that correctness, stress levels and annotation duration depend on the task itself. Average correctness among the experiment participants was slightly lower than achieved by ASBA. Triplet annotation turned to be similarly difficult for experts as for non-experts. CONCLUSION: Our lab experiment indicates that the task of triplet annotation must be prepared cautiously if delegated to crowdworkers. Neither certainty nor agreement among annotators should be assumed to imply correct annotation, because annotators may misjudge difficult tasks as easy and agree on incorrect annotations. Further research is needed to improve visualizations for complex tasks, to judiciously decide how much information to provide, Out-of-the-lab experiments in crowdworker setting are needed to identify appropriate designs of a human-annotation task, and to assess under what circumstances non-human annotation should be preferred.


Assuntos
Curadoria de Dados
9.
Am J Hum Genet ; 108(9): 1564-1577, 2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-34289339

RESUMO

A critical challenge in genetic diagnostics is the computational assessment of candidate splice variants, specifically the interpretation of nucleotide changes located outside of the highly conserved dinucleotide sequences at the 5' and 3' ends of introns. To address this gap, we developed the Super Quick Information-content Random-forest Learning of Splice variants (SQUIRLS) algorithm. SQUIRLS generates a small set of interpretable features for machine learning by calculating the information-content of wild-type and variant sequences of canonical and cryptic splice sites, assessing changes in candidate splicing regulatory sequences, and incorporating characteristics of the sequence such as exon length, disruptions of the AG exclusion zone, and conservation. We curated a comprehensive collection of disease-associated splice-altering variants at positions outside of the highly conserved AG/GT dinucleotides at the termini of introns. SQUIRLS trains two random-forest classifiers for the donor and for the acceptor and combines their outputs by logistic regression to yield a final score. We show that SQUIRLS transcends previous state-of-the-art accuracy in classifying splice variants as assessed by rank analysis in simulated exomes, and is significantly faster than competing methods. SQUIRLS provides tabular output files for incorporation into diagnostic pipelines for exome and genome analysis, as well as visualizations that contextualize predicted effects of variants on splicing to make it easier to interpret splice variants in diagnostic settings.


Assuntos
Algoritmos , Curadoria de Dados/métodos , Doenças Genéticas Inatas/genética , Sítios de Splice de RNA , Splicing de RNA , Software , Sequência de Bases , Biologia Computacional/métodos , Exoma , Éxons , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/patologia , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Íntrons , Mutação , Sequenciamento Completo do Exoma
10.
Sci Data ; 8(1): 167, 2021 07 06.
Artigo em Inglês | MEDLINE | ID: mdl-34230489

RESUMO

It is critical to quantitatively analyse the developing human fetal brain in order to fully understand neurodevelopment in both normal fetuses and those with congenital disorders. To facilitate this analysis, automatic multi-tissue fetal brain segmentation algorithms are needed, which in turn requires open datasets of segmented fetal brains. Here we introduce a publicly available dataset of 50 manually segmented pathological and non-pathological fetal magnetic resonance brain volume reconstructions across a range of gestational ages (20 to 33 weeks) into 7 different tissue categories (external cerebrospinal fluid, grey matter, white matter, ventricles, cerebellum, deep grey matter, brainstem/spinal cord). In addition, we quantitatively evaluate the accuracy of several automatic multi-tissue segmentation algorithms of the developing human fetal brain. Four research groups participated, submitting a total of 10 algorithms, demonstrating the benefits the dataset for the development of automatic algorithms.


Assuntos
Encéfalo/embriologia , Feto , Neurogênese , Algoritmos , Benchmarking , Encéfalo/diagnóstico por imagem , Anormalidades Congênitas/diagnóstico por imagem , Curadoria de Dados , Humanos , Imageamento por Ressonância Magnética , Tamanho do Órgão
11.
PLoS One ; 16(6): e0253829, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34170972

RESUMO

PURPOSE: Developing large-scale datasets with research-quality annotations is challenging due to the high cost of refining clinically generated markup into high precision annotations. We evaluated the direct use of a large dataset with only clinically generated annotations in development of high-performance segmentation models for small research-quality challenge datasets. MATERIALS AND METHODS: We used a large retrospective dataset from our institution comprised of 1,620 clinically generated segmentations, and two challenge datasets (PROMISE12: 50 patients, ProstateX-2: 99 patients). We trained a 3D U-Net convolutional neural network (CNN) segmentation model using our entire dataset, and used that model as a template to train models on the challenge datasets. We also trained versions of the template model using ablated proportions of our dataset, and evaluated the relative benefit of those templates for the final models. Finally, we trained a version of the template model using an out-of-domain brain cancer dataset, and evaluated the relevant benefit of that template for the final models. We used five-fold cross-validation (CV) for all training and evaluation across our entire dataset. RESULTS: Our model achieves state-of-the-art performance on our large dataset (mean overall Dice 0.916, average Hausdorff distance 0.135 across CV folds). Using this model as a pre-trained template for refining on two external datasets significantly enhanced performance (30% and 49% enhancement in Dice scores respectively). Mean overall Dice and mean average Hausdorff distance were 0.912 and 0.15 for the ProstateX-2 dataset, and 0.852 and 0.581 for the PROMISE12 dataset. Using even small quantities of data to train the template enhanced performance, with significant improvements using 5% or more of the data. CONCLUSION: We trained a state-of-the-art model using unrefined clinical prostate annotations and found that its use as a template model significantly improved performance in other prostate segmentation tasks, even when trained with only 5% of the original dataset.


Assuntos
Curadoria de Dados , Bases de Dados Factuais , Aprendizado Profundo , Próstata/diagnóstico por imagem , Tomografia Computadorizada por Raios X , Humanos , Masculino , Estudos Retrospectivos
12.
PLoS One ; 16(6): e0253069, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34129629

RESUMO

Humanity faces the challenge of conserving the attributes of biodiversity that may be essential to secure human wellbeing. Among all the organisms that are beneficial to humans, plants stand out as the most important providers of natural resources. Therefore, identifying plant uses is critical to preserve the beneficial potential of biodiversity and to promote basic and applied research on the relationship between plants and humans. However, much of this information is often uncritical, contradictory, of dubious value or simply not readily accessible to the great majority of scientists and policy makers. Here, we compiled a genus-level dataset of plant-use records for all accepted vascular plant taxa (13489 genera) using the information gathered in the 4th Edition of Mabberley's plant-book, the most comprehensive global review of plant classification and their uses published to date. From 1974 to 2017 all the information was systematically gathered, evaluated, and synthesized by David Mabberley, who reviewed over 1000 botanical sources including modern Floras, monographs, periodicals, handbooks, and authoritative websites. Plant uses were arranged across 28 standard categories of use following the Economic Botany Data Collection Standard guidelines, which resulted in a binary classification of 9478 plant-use records pertaining human and animal nutrition, materials, fuels, medicine, poisons, social and environmental uses. Of all the taxa included in the dataset, 33% were assigned to at least one category of use, the most common being "ornamental" (26%), "medicine" (16%), "human food" (13%) and "timber" (8%). In addition to a readily available binary matrix for quantitative analyses, we provide a control text matrix that links the former to the description of the uses in Mabberley's plant-book. We hope this dataset will serve to establish synergies between scientists and policy makers interested in plant-human interactions and to move towards the complete compilation and classification of the nature's contributions to people upon which the wellbeing of future generations may depend.


Assuntos
Traqueófitas/classificação , Agricultura/legislação & jurisprudência , Curadoria de Dados , Gerenciamento de Dados , Bases de Dados Factuais , Atividades Humanas , Humanos
13.
Am J Pathol ; 191(10): 1709-1716, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34129843

RESUMO

The need for huge data sets represents a bottleneck for the application of artificial intelligence. Substantially fewer annotated target lesions than normal tissues for comparison present an additional problem in the field of pathology. Organic brains overcome these limitations by utilizing large numbers of specialized neural nets arranged in both linear and parallel fashion, with each solving a restricted classification problem. They rely on local Hebbian error corrections as compared to the nonlocal back-propagation used in most artificial neural nets, and leverage reinforcement. For these reasons, even toddlers are able to classify objects after only a few examples. Rather than provide an overview of current AI research in pathology, this review focuses on general strategies for overcoming the data bottleneck. These include transfer learning, zero-shot learning, Siamese networks, one-class models, generative networks, and reinforcement learning. Neither an extensive mathematic background nor advanced programing skills are needed to make these subjects accessible to pathologists. However, some familiarity with the basic principles of deep learning, briefly reviewed here, is expected to be useful in understanding both the current limitations of machine learning and determining ways to address them.


Assuntos
Curadoria de Dados , Algoritmos , Humanos , Aprendizado de Máquina , Redes Neurais de Computação , Teoria Quântica
14.
Gigascience ; 10(5)2021 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-33963385

RESUMO

BACKGROUND: The 3D point cloud is the most direct and effective data form for studying plant structure and morphology. In point cloud studies, the point cloud segmentation of individual plants to organs directly determines the accuracy of organ-level phenotype estimation and the reliability of the 3D plant reconstruction. However, highly accurate, automatic, and robust point cloud segmentation approaches for plants are unavailable. Thus, the high-throughput segmentation of many shoots is challenging. Although deep learning can feasibly solve this issue, software tools for 3D point cloud annotation to construct the training dataset are lacking. RESULTS: We propose a top-to-down point cloud segmentation algorithm using optimal transportation distance for maize shoots. We apply our point cloud annotation toolkit for maize shoots, Label3DMaize, to achieve semi-automatic point cloud segmentation and annotation of maize shoots at different growth stages, through a series of operations, including stem segmentation, coarse segmentation, fine segmentation, and sample-based segmentation. The toolkit takes ∼4-10 minutes to segment a maize shoot and consumes 10-20% of the total time if only coarse segmentation is required. Fine segmentation is more detailed than coarse segmentation, especially at the organ connection regions. The accuracy of coarse segmentation can reach 97.2% that of fine segmentation. CONCLUSION: Label3DMaize integrates point cloud segmentation algorithms and manual interactive operations, realizing semi-automatic point cloud segmentation of maize shoots at different growth stages. The toolkit provides a practical data annotation tool for further online segmentation research based on deep learning and is expected to promote automatic point cloud processing of various plants.


Assuntos
Curadoria de Dados , Zea mays , Algoritmos , Reprodutibilidade dos Testes , Software
15.
BMJ Health Care Inform ; 28(1)2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33980500

RESUMO

OBJECTIVES: The value of healthcare data is being increasingly recognised, including the need to improve health dataset utility. There is no established mechanism for evaluating healthcare dataset utility making it difficult to evaluate the effectiveness of activities improving the data. To describe the method for generating and involving the user community in developing a proposed framework for evaluation and communication of healthcare dataset utility for given research areas. METHODS: Aninitial version of a matrix to review datasets across a range of dimensions wasdeveloped based on previous published findings regarding healthcare data. Thiswas used to initiate a design process through interviews and surveys with datausers representing a broad range of user types and use cases, to help develop afocused framework for characterising datasets. RESULTS: Following 21 interviews, 31 survey responses and testing on 43 datasets, five major categories and 13 subcategories were identified as useful for a dataset, including Data Model, Completeness and Linkage. Each sub-category was graded to facilitate rapid and reproducible evaluation of dataset utility for specific use-cases. Testing of applicability to >40 existing datasets demonstrated potential usefulness for subsequent evaluation in real-world practice. DISCUSSION: Theresearch has developed an evidenced-based initial approach for a framework tounderstand the utility of a healthcare dataset. It likely to require further refinementfollowing wider application and additional categories may be required. CONCLUSION: The process has resulted in a user-centred designed framework for objectively evaluating the likely utility of specific healthcare datasets, and therefore, should be of value both for potential users of health data, and for data custodians to identify the areas to provide the optimal value for data curation investment.


Assuntos
Atenção à Saúde/organização & administração , Informática Médica/organização & administração , Inteligência Artificial , Curadoria de Dados , Indústria Farmacêutica/organização & administração , Humanos , Medicina Estatal/organização & administração , Reino Unido
16.
Stud Health Technol Inform ; 281: 8-12, 2021 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-34042695

RESUMO

The aim of this study is to build an evaluation framework for the user-centric testing of the Data Curation Tool. The tool was developed in the scope of the FAIR4Health project to make health data FAIR by transforming them from legacy formats into a Common Data Model based on HL7 FHIR. The end user evaluation framework was built by following a methodology inspired from the Delphi method. We applied a series of questionnaires to a group of experts not only in different roles and skills, but also from various parts of Europe. Overall, 26 questions were formulated for 16 participants. The results showed that the users are satisfied with the capabilities and performance of the tool. The feedbacks were considered as recommendations for technical improvement and fed back into the software development cycle of the Data Curation Tool.


Assuntos
Curadoria de Dados , Software , Europa (Continente) , Humanos
17.
Methods Mol Biol ; 2305: 3-21, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33950382

RESUMO

Protein Data Bank is the single worldwide archive of experimentally determined macromolecular structure data. Established in 1971 as the first open access data resource in biology, the PDB archive is managed by the worldwide Protein Data Bank (wwPDB) consortium which has four partners-the RCSB Protein Data Bank (RCSB PDB; rcsb.org), the Protein Data Bank Japan (PDBj; pdbj.org), the Protein Data Bank in Europe (PDBe; pdbe.org), and BioMagResBank (BMRB; www.bmrb.wisc.edu ). The PDB archive currently includes ~175,000 entries. The wwPDB has established a number of task forces and working groups that bring together experts form the community who provide recommendations on improving data standards and data validation for improving data quality and integrity. The wwPDB members continue to develop the joint deposition, biocuration, and validation system (OneDep) to improve data quality and accommodate new data from emerging techniques such as 3DEM. Each PDB entry contains coordinate model and associated metadata for all experimentally determined atomic structures, experimental data for the traditional structure determination techniques (X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy), validation reports, and additional information on quaternary structures. The wwPDB partners are committed to following the FAIR (Findability, Accessibility, Interoperability, and Reproducibility) principles and have implemented a DOI resolution mechanism that provides access to all the relevant files for a given PDB entry. On average, >250 new entries are added to the archive every week and made available by each wwPDB partner via FTP area. The wwPDB partner sites also develop data access and analysis tools and make these available via their websites. wwPDB continues to work with experts in the community to establish a federation of archives for archiving structures determined using integrative/hybrid method where multiple experimental techniques are used.


Assuntos
Curadoria de Dados , Bases de Dados de Proteínas , Substâncias Macromoleculares/química , Modelos Moleculares , Cristalografia por Raios X , Confiabilidade dos Dados , Europa (Continente) , Japão , Ressonância Magnética Nuclear Biomolecular , Conformação Proteica , Proteínas/química , Reprodutibilidade dos Testes , Interface Usuário-Computador
18.
Nucleic Acids Res ; 49(W1): W352-W358, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-33950204

RESUMO

Searching and reading relevant literature is a routine practice in biomedical research. However, it is challenging for a user to design optimal search queries using all the keywords related to a given topic. As such, existing search systems such as PubMed often return suboptimal results. Several computational methods have been proposed as an effective alternative to keyword-based query methods for literature recommendation. However, those methods require specialized knowledge in machine learning and natural language processing, which can make them difficult for biologists to utilize. In this paper, we propose LitSuggest, a web server that provides an all-in-one literature recommendation and curation service to help biomedical researchers stay up to date with scientific literature. LitSuggest combines advanced machine learning techniques for suggesting relevant PubMed articles with high accuracy. In addition to innovative text-processing methods, LitSuggest offers multiple advantages over existing tools. First, LitSuggest allows users to curate, organize, and download classification results in a single interface. Second, users can easily fine-tune LitSuggest results by updating the training corpus. Third, results can be readily shared, enabling collaborative analysis and curation of scientific literature. Finally, LitSuggest provides an automated personalized weekly digest of newly published articles for each user's project. LitSuggest is publicly available at https://www.ncbi.nlm.nih.gov/research/litsuggest.


Assuntos
Publicações , Software , COVID-19 , Curadoria de Dados , Disparidades em Assistência à Saúde , Humanos , Internet , Neoplasias Hepáticas/epidemiologia , Aprendizado de Máquina
19.
Int J Comput Assist Radiol Surg ; 16(5): 849-859, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33982232

RESUMO

PURPOSE: Segmentation of surgical instruments in endoscopic video streams is essential for automated surgical scene understanding and process modeling. However, relying on fully supervised deep learning for this task is challenging because manual annotation occupies valuable time of the clinical experts. METHODS: We introduce a teacher-student learning approach that learns jointly from annotated simulation data and unlabeled real data to tackle the challenges in simulation-to-real unsupervised domain adaptation for endoscopic image segmentation. RESULTS: Empirical results on three datasets highlight the effectiveness of the proposed framework over current approaches for the endoscopic instrument segmentation task. Additionally, we provide analysis of major factors affecting the performance on all datasets to highlight the strengths and failure modes of our approach. CONCLUSIONS: We show that our proposed approach can successfully exploit the unlabeled real endoscopic video frames and improve generalization performance over pure simulation-based training and the previous state-of-the-art. This takes us one step closer to effective segmentation of surgical instrument in the annotation scarce setting.


Assuntos
Simulação por Computador , Curadoria de Dados , Endoscopia/métodos , Processamento de Imagem Assistida por Computador/métodos , Algoritmos , Artefatos , Humanos , Aprendizagem , Software , Estudantes , Gravação em Vídeo
20.
IEEE J Biomed Health Inform ; 25(9): 3396-3407, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33945489

RESUMO

Non-invasive heart rate estimation is of great importance in daily monitoring of cardiovascular diseases. In this paper, a bidirectional long short term memory (bi-LSTM) regression network is developed for non-invasive heart rate estimation from the ballistocardiograms (BCG) signals. The proposed deep regression model provides an effective solution to the existing challenges in BCG heart rate estimation, such as the mismatch between the BCG signals and ground-truth reference, multi-sensor fusion and effective time series feature learning. Allowing label uncertainty in the estimation can reduce the manual cost of data annotation while further improving the heart rate estimation performance. Compared with the state-of-the-art BCG heart rate estimation methods, the strong fitting and generalization ability of the proposed deep regression model maintains better robustness to noise (e.g., sensor noise) and perturbations (e.g., body movements) in the BCG signals and provides a more reliable solution for long term heart rate monitoring.


Assuntos
Balistocardiografia , Curadoria de Dados , Frequência Cardíaca , Humanos , Monitorização Fisiológica , Movimento
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...