Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
JCO Clin Cancer Inform ; 5: 833-841, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34406803

RESUMEN

PURPOSE: Natural language processing (NLP) in pathology reports to extract biomarker information is an ongoing area of research. MetaMap is a natural language processing tool developed and funded by the National Library of Medicine to map biomedical text to the Unified Medical Language System Metathesaurus by applying specific tags to clinically relevant terms. Although results are useful without additional postprocessing, these tags lack important contextual information. METHODS: Our novel method takes terminology-driven semantic tags and incorporates those into a semantic frame that is task-specific to add necessary context to MetaMap. We use important contextual information to capture biomarker results to support Community Health System's use of Precision Medicine treatments for patients with cancer. For each biomarker, the name, type, numeric quantifiers, non-numeric qualifiers, and the time frame are extracted. These fields then associate biomarkers with their context in the pathology report such as test type, probe intensity, copy-number changes, and even failed results. A selection of 6,713 relevant reports contained the following standard-of-care biomarkers for metastatic breast cancer: breast cancer gene 1 and 2, estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and programmed death-ligand 1. RESULTS: The method was tested on pathology reports from the internal pathology laboratory at Henry Ford Health System. A certified tumor registrar reviewed 400 tests, which showed > 95% accuracy for all extracted biomarker types. CONCLUSION: Using this new method, it is possible to extract high-quality, contextual biomarker information, and this represents a significant advance in biomarker extraction.


Asunto(s)
Procesamiento de Lenguaje Natural , Neoplasias , Biomarcadores , Humanos , Informe de Investigación
2.
Sci Rep ; 9(1): 7808, 2019 05 24.
Artículo en Inglés | MEDLINE | ID: mdl-31127153

RESUMEN

Whole-genome sequencing is increasingly adopted in clinical settings to identify pathogen transmissions, though largely as a retrospective tool. Prospective monitoring, in which samples are continuously added and compared to previous samples, can generate more actionable information. To enable prospective pathogen comparison, genomic relatedness metrics based on single-nucleotide differences must be consistent across time, efficient to compute and reliable for a large variety of samples. The choice of genomic regions to compare, i.e., the core genome, is critical to obtain a good metric. We propose a novel core genome method that selects conserved sequences in the reference genome by comparing its k-mer content to that of publicly available genome assemblies. The conserved-sequence genome is sample set-independent, which enables prospective pathogen monitoring. Based on clinical data sets of 3436 S. aureus, 1362 K. pneumoniae and 348 E. faecium samples, ROC curves demonstrate that the conserved-sequence genome disambiguates same-patient samples better than a core genome consisting of conserved genes. The conserved-sequence genome confirms outbreak samples with high sensitivity: in a set of 2335 S. aureus samples, it correctly identifies 44 out of 44 known outbreak samples, whereas the conserved-gene method confirms 38 known outbreak samples.


Asunto(s)
Infecciones Bacterianas/microbiología , Enfermedades Transmisibles/microbiología , Genoma Bacteriano , Genómica/métodos , Bacterias/genética , Infecciones Bacterianas/epidemiología , Enfermedades Transmisibles/epidemiología , Brotes de Enfermedades , Enterococcus faecium/genética , Humanos , Klebsiella pneumoniae/genética , Epidemiología Molecular , Staphylococcus aureus/genética , Secuenciación Completa del Genoma
3.
Infect Control Hosp Epidemiol ; 40(6): 649-655, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31012399

RESUMEN

BACKGROUND: Determining infectious cross-transmission events in healthcare settings involves manual surveillance of case clusters by infection control personnel, followed by strain typing of clinical/environmental isolates suspected in said clusters. Recent advances in genomic sequencing and cloud computing now allow for the rapid molecular typing of infecting isolates. OBJECTIVE: To facilitate rapid recognition of transmission clusters, we aimed to assess infection control surveillance using whole-genome sequencing (WGS) of microbial pathogens to identify cross-transmission events for epidemiologic review. METHODS: Clinical isolates of Staphylococcus aureus, Enterococcus faecium, Pseudomonas aeruginosa, and Klebsiella pneumoniae were obtained prospectively at an academic medical center, from September 1, 2016, to September 30, 2017. Isolate genomes were sequenced, followed by single-nucleotide variant analysis; a cloud-computing platform was used for whole-genome sequence analysis and cluster identification. RESULTS: Most strains of the 4 studied pathogens were unrelated, and 34 potential transmission clusters were present. The characteristics of the potential clusters were complex and likely not identifiable by traditional surveillance alone. Notably, only 1 cluster had been suspected by routine manual surveillance. CONCLUSIONS: Our work supports the assertion that integration of genomic and clinical epidemiologic data can augment infection control surveillance for both the identification of cross-transmission events and the inclusion of missed and exclusion of misidentified outbreaks (ie, false alarms). The integration of clinical data is essential to prioritize suspect clusters for investigation, and for existing infections, a timely review of both the clinical and WGS results can hold promise to reduce HAIs. A richer understanding of cross-transmission events within healthcare settings will require the expansion of current surveillance approaches.


Asunto(s)
Infección Hospitalaria/epidemiología , Genoma Bacteriano , Control de Infecciones/métodos , Tipificación Molecular , Secuenciación Completa del Genoma , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Niño , Preescolar , Análisis por Conglomerados , Infección Hospitalaria/microbiología , Infección Hospitalaria/prevención & control , Brotes de Enfermedades , Femenino , Humanos , Lactante , Recién Nacido , Masculino , Massachusetts , Persona de Mediana Edad , Epidemiología Molecular/métodos , Adulto Joven
4.
Bioinformatics ; 30(22): 3166-73, 2014 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-25075119

RESUMEN

MOTIVATION: Mapping of high-throughput sequencing data and other bulk sequence comparison applications have motivated a search for high-efficiency sequence alignment algorithms. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations composed of AND, OR, XOR, complement, shift and addition. Bit-parallelism has been successfully applied to the longest common subsequence (LCS) and edit-distance problems, producing fast algorithms in practice. RESULTS: We have developed BitPAl, a bit-parallel algorithm for general, integer-scoring global alignment. Integer-scoring schemes assign integer weights for match, mismatch and insertion/deletion. The BitPAl method uses structural properties in the relationship between adjacent scores in the scoring matrix to construct classes of efficient algorithms, each designed for a particular set of weights. In timed tests, we show that BitPAl runs 7-25 times faster than a standard iterative algorithm. AVAILABILITY AND IMPLEMENTATION: Source code is freely available for download at http://lobstah.bu.edu/BitPAl/BitPAl.html. BitPAl is implemented in C and runs on all major operating systems. CONTACT: jloving@bu.edu or yhernand@bu.edu or gbenson@bu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Alineación de Secuencia/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos
5.
Nucleic Acids Res ; 42(14): 8884-94, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25056320

RESUMEN

DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. In this paper we describe VNTRseek, our software for discovery of minisatellite VNTRs (pattern size ≥ 7 nucleotides) using whole genome sequencing data. VNTRseek maps sequencing reads to a set of reference TRs and then identifies putative VNTRs based on a discrepancy between the copy number of a reference and its mapped reads. VNTRseek was used to analyze the Watson and Khoisan genomes (454 technology) and two 1000 Genomes family trios (Illumina). In the Watson genome, we identified 752 VNTRs with pattern sizes ranging from 7 to 84 nt. In the Khoisan genome, we identified 2572 VNTRs with pattern sizes ranging from 7 to 105 nt. In the trios, we identified between 2660 and 3822 VNTRs per individual and found nearly 100% consistency with Mendelian inheritance. VNTRseek is, to the best of our knowledge, the first software for genome-wide detection of minisatellite VNTRs. It is available at http://orca.bu.edu/vntrseek/.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Repeticiones de Minisatélite , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Genoma Humano , Genómica/métodos , Humanos , Mutación INDEL
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...