Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
BMC Cancer ; 7: 37, 2007 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-17386082

RESUMEN

BACKGROUND: Shared Pathology Informatics Network (SPIN) is a tissue resource initiative that utilizes clinical reports of the vast amount of paraffin-embedded tissues routinely stored by medical centers. SPIN has an informatics component (sending tissue-related queries to multiple institutions via the internet) and a service component (providing histopathologically annotated tissue specimens for medical research). This paper examines if tissue blocks, identified by localized computer searches at participating institutions, can be retrieved in adequate quantity and quality to support medical researchers. METHODS: Four centers evaluated pathology reports (1990-2005) for common and rare tumors to determine the percentage of cases where suitable tissue blocks with tumor were available. Each site generated a list of 100 common tumor cases (25 cases each of breast adenocarcinoma, colonic adenocarcinoma, lung squamous carcinoma, and prostate adenocarcinoma) and 100 rare tumor cases (25 cases each of adrenal cortical carcinoma, gastro-intestinal stromal tumor [GIST], adenoid cystic carcinoma, and mycosis fungoides) using a combination of Tumor Registry, laboratory information system (LIS) and/or SPIN-related tools. Pathologists identified the slides/blocks with tumor and noted first 3 slides with largest tumor and availability of the corresponding block. RESULTS: Common tumors cases (n = 400), the institutional retrieval rates (all blocks) were 83% (A), 95% (B), 80% (C), and 98% (D). Retrieval rate (tumor blocks) from all centers for common tumors was 73% with mean largest tumor size of 1.49 cm; retrieval (tumor blocks) was highest-lung (84%) and lowest-prostate (54%). Rare tumors cases (n = 400), each institution's retrieval rates (all blocks) were 78% (A), 73% (B), 67% (C), and 84% (D). Retrieval rate (tumor blocks) from all centers for rare tumors was 66% with mean largest tumor size of 1.56 cm; retrieval (tumor blocks) was highest for GIST (72%) and lowest for adenoid cystic carcinoma (58%). CONCLUSION: Assessment shows availability and quality of archival tissue blocks that are retrievable and associated electronic data that can be of value for researchers. This study serves to compliment the data from which uniform use of the SPIN query tools by all four centers will be measured to assure and highlight the usefulness of archival material for obtaining tumor tissues for research.


Asunto(s)
Adhesión en Parafina/estadística & datos numéricos , Patología Clínica/organización & administración , Bancos de Tejidos/estadística & datos numéricos , Humanos , Informática Médica/organización & administración , Neoplasias/patología , Estados Unidos
2.
BMC Cancer ; 6: 120, 2006 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-16677389

RESUMEN

BACKGROUND: Advances in molecular biology and growing requirements from biomarker validation studies have generated a need for tissue banks to provide quality-controlled tissue samples with standardized clinical annotation. The NCI Cooperative Prostate Cancer Tissue Resource (CPCTR) is a distributed tissue bank that comprises four academic centers and provides thousands of clinically annotated prostate cancer specimens to researchers. Here we describe the CPCTR information management system architecture, common data element (CDE) development, query interfaces, data curation, and quality control. METHODS: Data managers review the medical records to collect and continuously update information for the 145 clinical, pathological and inventorial CDEs that the Resource maintains for each case. An Access-based data entry tool provides de-identification and a standard communication mechanism between each group and a central CPCTR database. Standardized automated quality control audits have been implemented. Centrally, an Oracle database has web interfaces allowing multiple user-types, including the general public, to mine de-identified information from all of the sites with three levels of specificity and granularity as well as to request tissues through a formal letter of intent. RESULTS: Since July 2003, CPCTR has offered over 6,000 cases (38,000 blocks) of highly characterized prostate cancer biospecimens, including several tissue microarrays (TMA). The Resource developed a website with interfaces for the general public as well as researchers and internal members. These user groups have utilized the web-tools for public query of summary data on the cases that were available, to prepare requests, and to receive tissues. As of December 2005, the Resource received over 130 tissue requests, of which 45 have been reviewed, approved and filled. Additionally, the Resource implemented the TMA Data Exchange Specification in its TMA program and created a computer program for calculating PSA recurrence. CONCLUSION: Building a biorepository infrastructure that meets today's research needs involves time and input of many individuals from diverse disciplines. The CPCTR can provide large volumes of carefully annotated prostate tissue for research initiatives such as Specialized Programs of Research Excellence (SPOREs) and for biomarker validation studies and its experience can help development of collaborative, large scale, virtual tissue banks in other organ systems.


Asunto(s)
Gestión de la Información , Aplicaciones de la Informática Médica , Neoplasias de la Próstata/patología , Bancos de Tejidos , Bases de Datos como Asunto , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Gestión de la Información/normas , Internet , Masculino , Mercadotecnía , Registros Médicos , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/metabolismo , Control de Calidad , Bancos de Tejidos/normas
3.
Hum Pathol ; 36(2): 139-45, 2005 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-15754290

RESUMEN

It is impossible to overstate the importance of XML (eXtensible Markup Language) as a data organization tool. With XML, pathologists can annotate all of their data (clinical and anatomic) in a format that can transform every pathology report into a database, without compromising narrative structure. The purpose of this manuscript is to provide an overview of XML for pathologists. Examples will demonstrate how pathologists can use XML to annotate individual data elements and to structure reports in a common format that can be merged with other XML files or queried using standard XML tools. This manuscript gives pathologists a glimpse into how XML allows pathology data to be linked to other types of biomedical data and reduces our dependence on centralized proprietary databases.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos como Asunto/organización & administración , Informática Médica/métodos , Patología/métodos , Lenguajes de Programación , Terminología como Asunto , Bases de Datos como Asunto/normas , Humanos
4.
BMC Cancer ; 5: 108, 2005 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-16111498

RESUMEN

BACKGROUND: The Cooperative Prostate Cancer Tissue Resource (CPCTR) is a consortium of four geographically dispersed institutions that are funded by the U.S. National Cancer Institute (NCI) to provide clinically annotated prostate cancer tissue samples to researchers. To facilitate this effort, it was critical to arrive at agreed upon common data elements (CDEs) that could be used to collect demographic, pathologic, treatment and clinical outcome data. METHODS: The CPCTR investigators convened a CDE curation subcommittee to develop and implement CDEs for the annotation of collected prostate tissues. The draft CDEs were refined and progressively annotated to make them ISO 11179 compliant. The CDEs were implemented in the CPCTR database and tested using software query tools developed by the investigators. RESULTS: By collaborative consensus the CPCTR CDE subcommittee developed 145 data elements to annotate the tissue samples collected. These included for each case: 1) demographic data, 2) clinical history, 3) pathology specimen level elements to describe the staging, grading and other characteristics of individual surgical pathology cases, 4) tissue block level annotation critical to managing a virtual inventory of cases and facilitating case selection, and 5) clinical outcome data including treatment, recurrence and vital status. These elements have been used successfully to respond to over 60 requests by end-users for tissue, including paraffin blocks from cases with 5 to 10 years of follow up, tissue microarrays (TMAs), as well as frozen tissue collected prospectively for genomic profiling and genetic studies. The CPCTR CDEs have been fully implemented in two major tissue banks and have been shared with dozens of other tissue banking efforts. CONCLUSION: The freely available CDEs developed by the CPCTR are robust, based on "best practices" for tissue resources, and are ISO 11179 compliant. The process for CDE development described in this manuscript provides a framework model for other organ sites and has been used as a model for breast and melanoma tissue banking efforts.


Asunto(s)
Biología Computacional/métodos , Bases de Datos como Asunto , Neoplasias de la Próstata/patología , Bancos de Tejidos , Computadores , Humanos , Masculino , Neoplasias de la Próstata/metabolismo , Recurrencia , Programas Informáticos , Resultado del Tratamiento
5.
Clin Cancer Res ; 10(14): 4614-21, 2004 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-15269132

RESUMEN

PURPOSE: The Cooperative Prostate Cancer Tissue Resource (CPCTR) is a National Cancer Institute-supported tissue bank that provides large numbers of clinically annotated prostate cancer specimens to investigators. This communication describes the CPCTR to investigators interested in obtaining prostate cancer tissue samples. EXPERIMENTAL DESIGN: The CPCTR, through its four participating institutions, has collected specimens and clinical data for prostate cancer cases diagnosed from 1989 onward. These specimens include paraffin blocks and frozen tissue from radical prostatectomy specimens and paraffin blocks from prostate needle biopsies. Standardized histopathological characterization and clinical data extraction are performed for all cases. Information on histopathology, demography (including ethnicity), laboratory data (prostate-specific antigen values), and clinical outcome related to prostate cancer are entered into the CPCTR database for all cases. Materials in the CPCTR are available in multiple tissue formats, including tissue microarray sections, paraffin-embedded tissue sections, serum, and frozen tissue specimens. These are available for research purposes following an application process that is described on the CPCTR web site (www.prostatetissues.org). RESULTS: The CPCTR currently (as of October 2003) contains 5135 prostate cancer cases including 4723 radical prostatectomy cases. Frozen tissues, in some instances including patient serum samples, are available for 1226 cases. Biochemical recurrence data allow identification of cases with residual disease, cases with recurrence, and recurrence-free cases. CONCLUSIONS: The CPCTR offers large numbers of highly characterized prostate cancer tissue specimens, including tissue microarrays, with associated clinical data for biomarker studies. Interested investigators are encouraged to apply for use of this material (www.prostatetissues.org).


Asunto(s)
Neoplasias de la Próstata/patología , Bancos de Tejidos/organización & administración , Adulto , Anciano , Anciano de 80 o más Años , Investigación Biomédica/métodos , Investigación Biomédica/estadística & datos numéricos , Humanos , Masculino , Persona de Mediana Edad , Prostatectomía , Neoplasias de la Próstata/cirugía , Neoplasias de la Próstata/terapia , Bancos de Tejidos/tendencias , Estados Unidos
6.
BMC Med Inform Decis Mak ; 5: 35, 2005 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-16232314

RESUMEN

BACKGROUND: New terminology continuously enters the biomedical literature. How can curators identify new terms that can be added to existing nomenclatures? The most direct method, and one that has served well, involves reading the current literature. The scholarly curator adds new terms as they are encountered. Present-day scholars are severely challenged by the enormous volume of biomedical literature. Curators of medical nomenclatures need computational assistance if they hope to keep their terminologies current. The purpose of this paper is to describe a method of rapidly extracting new, candidate terms from huge volumes of biomedical text. The resulting lists of terms can be quickly reviewed by curators and added to nomenclatures, if appropriate. The candidate term extractor uses a variation of the previously described doublet coding method. The algorithm, which operates on virtually any nomenclature, derives from the observation that most terms within a knowledge domain are composed entirely of word combinations found in other terms from the same knowledge domain. Terms can be expressed as sequences of overlapping word doublets that have more specific meaning than the individual words that compose the term. The algorithm parses through text, finding contiguous sequences of word doublets that are known to occur somewhere in the reference nomenclature. When a sequence of matching word doublets is encountered, it is compared with whole terms already included in the nomenclature. If the doublet sequence is not already in the nomenclature, it is extracted as a candidate new term. Candidate new terms can be reviewed by a curator to determine if they should be added to the nomenclature. An implementation of the algorithm is demonstrated, using a corpus of published abstracts obtained through the National Library of Medicine's PubMed query service and using "The developmental lineage classification and taxonomy of neoplasms" as a reference nomenclature. RESULTS: A 31+ Megabyte corpus of pathology journal abstracts was parsed using the doublet extraction method. This corpus consisted of 4,289 records, each containing an abstract title. The total number of words included in the abstract titles was 50,547. New candidate terms for the nomenclature were automatically extracted from the titles of abstracts in the corpus. Total execution time on a desktop computer with CPU speed of 2.79 GHz was 2 seconds. The resulting output consisted of 313 new candidate terms, each consisting of concatenated doublets found in the reference nomenclature. Human review of the 313 candidate terms yielded a list of 285 terms approved by a curator. A final automatic extraction of duplicate terms yielded a final list of 222 new terms (71% of the original 313 extracted candidate terms) that could be added to the reference nomenclature. CONCLUSION: The doublet method for automatically extracting candidate nomenclature terms can be used to quickly find new terms from vast amounts of text. The method can be immediately adapted for virtually any text and any nomenclature. An implementation of the algorithm, in the Perl programming language, is provided with this article.


Asunto(s)
Procesamiento Automatizado de Datos/métodos , Almacenamiento y Recuperación de la Información , Computación en Informática Médica , Terminología como Asunto , Indización y Redacción de Resúmenes , Algoritmos , Humanos , Medical Subject Headings , National Library of Medicine (U.S.) , Neoplasias/clasificación , PubMed , Semántica , Integración de Sistemas , Estados Unidos
7.
BMC Bioinformatics ; 5: 19, 2004 Feb 27.
Artículo en Inglés | MEDLINE | ID: mdl-15040818

RESUMEN

BACKGROUND: Tissue Microarrays (TMAs) have emerged as a powerful tool for examining the distribution of marker molecules in hundreds of different tissues displayed on a single slide. TMAs have been used successfully to validate candidate molecules discovered in gene array experiments. Like gene expression studies, TMA experiments are data intensive, requiring substantial information to interpret, replicate or validate. Recently, an open access Tissue Microarray Data Exchange Specification has been released that allows TMA data to be organized in a self-describing XML document annotated with well-defined common data elements. While this specification provides sufficient information for the reproduction of the experiment by outside research groups, its initial description did not contain instructions or examples of actual implementations, and no implementation studies have been published. The purpose of this paper is to demonstrate how the TMA Data Exchange Specification is implemented in a prostate cancer TMA. RESULTS: The Cooperative Prostate Cancer Tissue Resource (CPCTR) is funded by the National Cancer Institute to provide researchers with samples of prostate cancer annotated with demographic and clinical data. The CPCTR now offers prostate cancer TMAs and has implemented a TMA database conforming to the new open access Tissue Microarray Data Exchange Specification. The bulk of the TMA database consists of clinical and demographic data elements for 299 patient samples. These data elements were extracted from an Excel database using a transformative Perl script. The Perl script and the TMA database are open access documents distributed with this manuscript. CONCLUSIONS: TMA databases conforming to the Tissue Microarray Data Exchange Specification can be merged with other TMA files, expanded through the addition of data elements, or linked to data contained in external biological databases. This article describes an open access implementation of the TMA Data Exchange Specification and provides detailed guidance to researchers who wish to use the Specification.


Asunto(s)
Conducta Cooperativa , Bases de Datos Genéticas/normas , Perfilación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Neoplasias de la Próstata/genética , Confidencialidad/normas , Confidencialidad/tendencias , Bases de Datos Genéticas/legislación & jurisprudencia , Bases de Datos Genéticas/tendencias , Perfilación de la Expresión Génica/normas , Humanos , Gestión de la Información/métodos , Gestión de la Información/normas , Gestión de la Información/tendencias , Internet/legislación & jurisprudencia , Internet/normas , Internet/tendencias , Masculino , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Neoplasias de la Próstata/patología
8.
Hum Pathol ; 35(8): 918-33, 2004 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-15297960

RESUMEN

The National Cancer Institute sponsored a Borderline Ovarian Tumor Workshop held in August 2003 in Bethesda, MD. This report was developed from discussions at the Workshop. The participants acknowledged several areas of disagreement on basic terminology issues and agreed that a glossary with example images would help clarify many commonly misunderstood issues. This report defines terminology used in the pathological description of borderline tumors and their variants, and illustrates examples of each of the most common entities. It also addresses controversial aspects of the definitions and issues involving specimen handling and reporting. For those issues where there is disagreement, the terminology and diagnostic approaches reflecting the differing views are presented.


Asunto(s)
Cistadenocarcinoma/patología , Cistoadenoma/patología , Neoplasias Ováricas/patología , Patología/educación , Terminología como Asunto , Cistadenocarcinoma/clasificación , Cistoadenoma/clasificación , Femenino , Humanos , Neoplasias Ováricas/clasificación , Patología/métodos
9.
Hum Pathol ; 33(5): 459-65, 2002 May.
Artículo en Inglés | MEDLINE | ID: mdl-12094370

RESUMEN

As a result of major recent advances in understanding the biology of gastrointestinal stromal tumors (GISTs), specifically recognition of the central role of activating KIT mutations and associated KIT protein expression in these lesions, and the development of novel and effective therapy for GISTs using the receptor tyrosine kinase inhibitor STI-571, these tumors have become the focus of considerable attention by pathologists, clinicians, and patients. Stromal/mesenchymal tumors of the gastrointestinal tract have long been a source of confusion and controversy with regard to classification, line(s) of differentiation, and prognostication. Characterization of the KIT pathway and its phenotypic implications has helped to resolve some but not all of these issues. Given the now critical role of accurate and reproducible pathologic diagnosis in ensuring appropriate treatment for patients with GIST, the National Institutes of Health convened a GIST workshop in April 2001 with the goal of developing a consensus approach to diagnosis and morphologic prognostication. Key elements of the consensus, as described herein, are the defining role of KIT immunopositivity in diagnosis and a proposed scheme for estimating metastatic risk in these lesions, based on tumor size and mitotic count, recognizing that it is probably unwise to use the definitive term "benign" for any GIST, at least at the present time.


Asunto(s)
Neoplasias Gastrointestinales/diagnóstico , Sarcoma/diagnóstico , Antineoplásicos/uso terapéutico , Benzamidas , Biomarcadores de Tumor/metabolismo , Neoplasias Gastrointestinales/tratamiento farmacológico , Neoplasias Gastrointestinales/metabolismo , Humanos , Mesilato de Imatinib , Mutación , Piperazinas/uso terapéutico , Proteínas Proto-Oncogénicas c-kit/genética , Proteínas Proto-Oncogénicas c-kit/metabolismo , Pirimidinas/uso terapéutico , Sarcoma/tratamiento farmacológico , Sarcoma/metabolismo , Células del Estroma/patología
10.
Dev Growth Differ ; 21(6): 519-525, 1979.
Artículo en Inglés | MEDLINE | ID: mdl-37281567

RESUMEN

The differentiation in organ culture of a rat nephroblastoma is compared with differentiation of normal rat metanephric tissue under the same conditions. The nephroblastoma arose in a 19 week old female Fischer F344 rat given a single intraperitoneal injection of 4.0 µmole methyl(methoxymethy1)nitrosamine (DMN-OMe)/g body weight at one day of age. The tumor consisted almost entirely of spindle cells although a few well-differentiated tubules were scattered throughout the tumor mass. No primitive tubules were seen, but focal aggregates of tumor cells suggestive of nascent epithelial differentiation were frequent. Fragments of the nephroblastoma were cultured on gelfoam sponge in Williams Medium E supplemented with hydrocortisone, insulin, and fetal bovine serum. Within one day extensive tubulogenesis was observed. High mitotic activity resulted in a steady increase in the size of cultured explants over a period of 6 days. By day six, differentiating tubules filled the explant tissue. Cultured fragments were nearly indistinguishable histologically from normal F344 rat fetal kidney explanted to organ culture on day 15 of gestation and grown in vitro for the same period.

11.
BMC Cancer ; 4: 88, 2004 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-15571625

RESUMEN

BACKGROUND: The new "Developmental lineage classification of neoplasms" was described in a prior publication. The classification is simple (the entire hierarchy is described with just 39 classifiers), comprehensive (providing a place for every tumor of man), and consistent with recent attempts to characterize tumors by cytogenetic and molecular features. A taxonomy is a list of the instances that populate a classification. The taxonomy of neoplasia attempts to list every known term for every known tumor of man. METHODS: The taxonomy provides each concept with a unique code and groups synonymous terms under the same concept. A Perl script validated successive drafts of the taxonomy ensuring that: 1) each term occurs only once in the taxonomy; 2) each term occurs in only one tumor class; 3) each concept code occurs in one and only one hierarchical position in the classification; and 4) the file containing the classification and taxonomy is a well-formed XML (eXtensible Markup Language) document. RESULTS: The taxonomy currently contains 122,632 different terms encompassing 5,376 neoplasm concepts. Each concept has, on average, 23 synonyms. The taxonomy populates "The developmental lineage classification of neoplasms," and is available as an XML file, currently 9+ Megabytes in length. A representation of the classification/taxonomy listing each term followed by its code, followed by its full ancestry, is available as a flat-file, 19+ Megabytes in length. The taxonomy is the largest nomenclature of neoplasms, with more than twice the number of neoplasm names found in other medical nomenclatures, including the 2004 version of the Unified Medical Language System, the Systematized Nomenclature of Medicine Clinical Terminology, the National Cancer Institute's Thesaurus, and the International Classification of Diseases Oncolology version. CONCLUSIONS: This manuscript describes a comprehensive taxonomy of neoplasia that collects synonymous terms under a unique code number and assigns each tumor to a single class within the tumor hierarchy. The entire classification and taxonomy are available as open access files (in XML and flat-file formats) with this article.


Asunto(s)
Linaje de la Célula , Neoplasias/clasificación , Bases de Datos Factuales , Femenino , Humanos , Masculino , Células Madre Neoplásicas/clasificación
12.
BMC Cancer ; 4: 10, 2004 Mar 17.
Artículo en Inglés | MEDLINE | ID: mdl-15113444

RESUMEN

BACKGROUND: Traditionally, tumors have been classified by their morphologic appearances. Unfortunately, tumors with similar histologic features often follow different clinical courses or respond differently to chemotherapy. Limitations in the clinical utility of morphology-based tumor classifications have prompted a search for a new tumor classification based on molecular analysis. Gene expression array data and proteomic data from tumor samples will provide complex data that is unobtainable from morphologic examination alone. The growing question facing cancer researchers is, "How can we successfully integrate the molecular, morphologic and clinical characteristics of human cancer to produce a helpful tumor classification?" DISCUSSION: Current efforts to classify cancers based on molecular features ignore lessons learned from millennia of experience in biological classification. A tumor classification must include every type of tumor and must provide a unique place for each tumor within the classification. Groups within a classification inherit the properties of their ancestors and impart properties to their descendants. A classification was prepared grouping tumors according to their histogenetic development. The classification is simple (reducing the complexity of information received from the molecular analysis of tumors), comprehensive (providing a place for every tumor of man), and consistent with recent attempts to characterize tumors by cytogenetic and molecular features. The clinical and research value of this historical approach to tumor classification is discussed. SUMMARY: This manuscript reviews tumor classification and provides a new and comprehensive classification for neoplasia that preserves traditional nomenclature while incorporating information derived from the molecular analysis of tumors. The classification is provided as an open access XML document that can be used by cancer researchers to relate tumor classes with heterogeneous experimental and clinical tumor databases.


Asunto(s)
Neoplasias/clasificación , Vocabulario Controlado , Germinoma/clasificación , Humanos , Neoplasias/genética , Neoplasias/patología
13.
BMC Med Res Methodol ; 2: 12, 2002 Nov 11.
Artículo en Inglés | MEDLINE | ID: mdl-12425722

RESUMEN

BACKGROUND: Medical researchers often need to share clinical data without violating patient confidentiality. Threshold cryptographic protocols divide messages into multiple pieces, no single piece containing information that can reconstruct the original message. The author describes and implements a novel threshold protocol that can be used to search, annotate or transform confidential data without breaching patient confidentiality. METHODS: The basic threshold protocol is: 1) Text is divided into short phrases; 2) Each phrase is converted by a one-way hash algorithm into a seemingly-random set of characters; 3) Threshold Piece 1 is composed of the list of all phrases, with each phrase followed by its one-way hash; 4) Threshold Piece 2 is composed of the text with all phrases replaced by their one-way hash values, and with high-frequency words preserved. Neither Piece 1 nor Piece 2 contains information linking patients to their records. The original text can be re-constructed from Piece 1 and Piece 2. RESULTS: The threshold algorithm produces two files (threshold pieces). In typical usage, Piece 2 is held by the data owner, and Piece 1 is freely distributed. Piece 1 can be annotated and returned to the owner of the original data to enhance the complete data set. Collections of Piece 1 files can be merged and distributed without identifying patient records. Variations of the threshold protocol are described. The author's Perl implementation is freely available. CONCLUSIONS: Threshold files are safe in the sense that they are de-identified and can be used for research purposes. The threshold protocol is particularly useful when the receiver of the threshold file needs to obtain certain concepts or data-types found in the original data, but does not need to fully understand the original data set.


Asunto(s)
Confidencialidad/normas , Sistemas de Registros Médicos Computarizados/normas , Algoritmos , Humanos , Sistemas de Registros Médicos Computarizados/tendencias , Lenguajes de Programación , Programas Informáticos
14.
Artif Intell Med ; 26(1-2): 25-36, 2002.
Artículo en Inglés | MEDLINE | ID: mdl-12234715

RESUMEN

The first task in any medical data mining effort is ensuring patient confidentiality. In the past, most data mining efforts ensured confidentiality by the dubious policy of withholding their raw data from colleagues and the public. A cursory review of medical informatics literature in the past decade reveals that much of what we have "learned" consists of assertions derived from confidential datasets unavailable for anyone's review. Without access to the original data, it is impossible to validate or improve upon a researcher's conclusions. Without access to research data, we are asked to accept findings as an act of faith, rather than as a scientific conclusion. This special issue of Artificial Intelligence in Medicine is devoted to medical data mining. The medical data miner has an obligation to conduct valid research in a way that protects human subjects. Today, data miners have the technical tools to merge large data collections and to distribute queries over disparate databases. In order to include patient-related data in shared databases, data miners will need methods to anonymize and deidentify data. This article reviews the human subject risks associated with medical data mining. This article also describes some of the innovative computational remedies that will permit researchers to conduct research AND share their data without risk to patient or institution.


Asunto(s)
Seguridad Computacional , Confidencialidad , Almacenamiento y Recuperación de la Información , Sistemas de Registros Médicos Computarizados , Bases de Datos Factuales , Humanos , Servicios de Información , Informática Médica
15.
Int J Surg Pathol ; 10(2): 81-9, 2002 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-12075401

RESUMEN

As a result of major recent advances in understanding the biology of gastrointestinal stromal tumors (GIST), specifically recognition of the central role of activating KIT mutations and associated KIT protein expression in these lesions, and the development of novel and effective therapy for GISTs using the receptor tyrosine kinase inhibitor STI-571, these tumors have become the focus of considerable attention among pathologists, clinicians, and patients. Stromal/mesenchymal tumors of the gastrointestinal tract have long been a source of confusion and controversy with regard to classification, line(s) of differentiation, and prognostication. Characterization of the KIT pathway and its phenotypic implications has helped to resolve some but not all of these issues. Given the now critical role of accurate and reproducible pathologic diagnosis in ensuring appropriate treatment for patients with GIST, the National Institutes of Health (NIH) convened a GIST workshop in April 2001 with the goal of developing a consensus approach to diagnosis and morphologic prognostication. Key elements of the consensus, as described herein, are the defining role of KIT immunopositivity in diagnosis and a proposed scheme for estimating metastatic risk in these lesions, based on tumor size and mitotic count, recognizing that it is probably unwise to use the definitive term benign for any GIST, at least at the present time.


Asunto(s)
Neoplasias Gastrointestinales/diagnóstico , Proteínas Oncogénicas , Antineoplásicos/uso terapéutico , Benzamidas , Biomarcadores de Tumor/metabolismo , Neoplasias Gastrointestinales/tratamiento farmacológico , Neoplasias Gastrointestinales/metabolismo , Humanos , Mesilato de Imatinib , Mutación , Piperazinas/uso terapéutico , Proteínas Tirosina Quinasas/antagonistas & inhibidores , Proteínas Proto-Oncogénicas c-kit/genética , Proteínas Proto-Oncogénicas c-kit/metabolismo , Pirimidinas/uso terapéutico , Células del Estroma/patología
16.
BMC Med Inform Decis Mak ; 4: 16, 2004 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-15369595

RESUMEN

BACKGROUND: Autocoding (or automatic concept indexing) occurs when a software program extracts terms contained within text and maps them to a standard list of concepts contained in a nomenclature. The purpose of autocoding is to provide a way of organizing large documents by the concepts represented in the text. Because textual data accumulates rapidly in biomedical institutions, the computational methods used to autocode text must be very fast. The purpose of this paper is to describe the doublet method, a new algorithm for very fast autocoding. METHODS: An autocoder was written that transforms plain-text into intercalated word doublets (e.g. "The ciliary body produces aqueous humor" becomes "The ciliary, ciliary body, body produces, produces aqueous, aqueous humor"). Each doublet is checked against an index of doublets extracted from a standard nomenclature. Matching doublets are assigned a numeric code specific for each doublet found in the nomenclature. Text doublets that do not match the index of doublets extracted from the nomenclature are not part of valid nomenclature terms. Runs of matching doublets from text are concatenated and matched against nomenclature terms (also represented as runs of doublets). RESULTS: The doublet autocoder was compared for speed and performance against a previously published phrase autocoder. Both autocoders are Perl scripts, and both autocoders used an identical text (a 170+ Megabyte collection of abstracts collected through a PubMed search) and the same nomenclature (neocl.xml, containing over 102,271 unique names of neoplasms). In side-by-side comparison on the same computer, the doublet method autocoder was 8.4 times faster than the phrase autocoder (211 seconds versus 1,776 seconds). The doublet method codes 0.8 Megabytes of text per second on a desktop computer with a 1.6 GHz processor. In addition, the doublet autocoder successfully matched terms that were missed by the phrase autocoder, while the phrase autocoder found no terms that were missed by the doublet autocoder. CONCLUSIONS: The doublet method of autocoding is a novel algorithm for rapid text autocoding. The method will work with any nomenclature and will parse any ascii plain-text. An implementation of the algorithm in Perl is provided with this article. The algorithm, the Perl implementation, the neoplasm nomenclature, and Perl itself, are all open source materials.


Asunto(s)
Algoritmos , Procesamiento Automatizado de Datos/métodos , Procesamiento de Lenguaje Natural , Neoplasias/clasificación , Terminología como Asunto , Indización y Redacción de Resúmenes , Computadores , Humanos , Programas Informáticos , Diseño de Software , Unified Medical Language System
17.
BMC Med Inform Decis Mak ; 4: 8, 2004 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-15198804

RESUMEN

BACKGROUND: Concept indexing is a popular method for characterizing medical text, and is one of the most important early steps in many data mining efforts. Concept indexing differs from simple word or phrase indexing because concepts are typically represented by a nomenclature code that binds a medical concept to all equivalent representations. A concept search on the term renal cell carcinoma would be expected to find occurrences of hypernephroma, and renal carcinoma (concept equivalents). The purpose of this study is to provide freely available resources to compare speed and performance among different autocoders. These tools consist of: 1) a public domain autocoder written in Perl (a free and open source programming language that installs on any operating system); 2) a nomenclature database derived from the unencumbered subset of the publicly available Unified Medical Language System; 3) a large corpus of autocoded output derived from a publicly available medical text. METHODS: A simple lexical autocoder was written that parses plain-text into a listing of all 1,2,3, and 4-word strings contained in text, assigning a nomenclature code for text strings that match terms in the nomenclature. The nomenclature used is the unencumbered subset of the 2003 Unified Medical Language System (UMLS). The unencumbered subset of UMLS was reduced to exclude homonymous one-word terms and proper names, resulting in a term/code data dictionary containing about a half million medical terms. The Online Mendelian Inheritance in Man (OMIM), a 92+ Megabyte publicly available medical opus, was used as sample medical text for the autocoder. RESULTS: The autocoding Perl script is remarkably short, consisting of just 38 command lines. The 92+ Megabyte OMIM file was completely autocoded in 869 seconds on a 2.4 GHz processor (less than 10 seconds per Megabyte of text). The autocoded output file (9,540,442 bytes) contains 367,963 coded terms from OMIM and is distributed with this manuscript. CONCLUSIONS: A public domain Perl script is provided that can parse through plain-text files of any length, matching concepts against an external nomenclature. The script and associated files can be used freely to compare the speed and performance of autocoding software.


Asunto(s)
Indización y Redacción de Resúmenes/métodos , Indización y Redacción de Resúmenes/normas , Bases de Datos Genéticas , Lenguajes de Programación , Programas Informáticos , Diseño de Software , Unified Medical Language System
18.
BMC Med Inform Decis Mak ; 3: 6, 2003 Jun 16.
Artículo en Inglés | MEDLINE | ID: mdl-12809560

RESUMEN

BACKGROUND: Large biomedical data sets have become increasingly important resources for medical researchers. Modern biomedical data sets are annotated with standard terms to describe the data and to support data linking between databases. The largest curated listing of biomedical terms is the the National Library of Medicine's Unified Medical Language System (UMLS). The UMLS contains more than 2 million biomedical terms collected from nearly 100 medical vocabularies. Many of the vocabularies contained in the UMLS carry restrictions on their use, making it impossible to share or distribute UMLS-annotated research data. However, a subset of the UMLS vocabularies, designated Category 0 by UMLS, can be used to annotate and share data sets without violating the UMLS License Agreement. METHODS: The UMLS Category 0 vocabularies can be extracted from the parent UMLS metathesaurus using a Perl script supplied with this article. There are 43 Category 0 vocabularies that can be used freely for research purposes without violating the UMLS License Agreement. Among the Category 0 vocabularies are: MESH (Medical Subject Headings), NCBI (National Center for Bioinformatics) Taxonomy and ICD-9-CM (International Classification of Diseases-9-Clinical Modifiers). RESULTS: The extraction file containing all Category 0 terms and concepts is 72,581,138 bytes in length and contains 1,029,161 terms. The UMLS Metathesaurus MRCON file (January, 2003) is 151,048,493 bytes in length and contains 2,146,899 terms. Therefore the Category 0 vocabularies, in aggregate, are about half the size of the UMLS metathesaurus.A large publicly available listing of 567,921 different medical phrases were automatically coded using the full UMLS metatathesaurus and the Category 0 vocabularies. There were 545,321 phrases with one or more matches against UMLS terms while 468,785 phrases had one or more matches against the Category 0 terms. This indicates that when the two vocabularies are evaluated by their fitness to find at least one term for a medical phrase, the Category 0 vocabularies performed 86% as well as the complete UMLS metathesaurus. CONCLUSION: The Category 0 vocabularies of UMLS constitute a large nomenclature that can be used by biomedical researchers to annotate biomedical data. These annotated data sets can be distributed for research purposes without violating the UMLS License Agreement. These vocabularies may be of particular importance for sharing heterogeneous data from diverse biomedical data sets. The software tools to extract the Category 0 vocabularies are freely available Perl scripts entered into the public domain and distributed with this article.


Asunto(s)
Técnicas de Apoyo para la Decisión , Informática Médica/métodos , Programas Informáticos/tendencias , Unified Medical Language System/tendencias , Algoritmos , Bases de Datos Bibliográficas/tendencias , Bases de Datos Factuales/tendencias , Humanos , Informática Médica/estadística & datos numéricos , Informática Médica/tendencias , Proyectos de Investigación/estadística & datos numéricos , Proyectos de Investigación/tendencias , Programas Informáticos/estadística & datos numéricos , Unified Medical Language System/estadística & datos numéricos
19.
BMC Med Inform Decis Mak ; 3: 8, 2003 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-12818004

RESUMEN

BACKGROUND: During carcinogenesis, precancers are the morphologically identifiable lesions that precede invasive cancers. In theory, the successful treatment of precancers would result in the eradication of most human cancers. Despite the importance of these lesions, there has been no effort to list and classify all of the precancers. The purpose of this study is to describe the first comprehensive taxonomy and classification of the precancers. As a novel approach to disease classification, terms and classes were annotated with metadata (data that describes the data) so that the classification could be used to link precancer terms to data elements in other biological databases. METHODS: Terms in the UMLS (Unified Medical Language System) related to precancers were extracted. Extracted terms were reviewed and additional terms added. Each precancer was assigned one of six general classes. The entire classification was assembled as an XML (eXtensible Mark-up Language) file. A Perl script converted the XML file into a browser-viewable HTML (HyperText Mark-up Language) file. RESULTS: The classification contained 4700 precancer terms, 568 distinct precancer concepts and six precancer classes: 1) Acquired microscopic precancers; 2) acquired large lesions with microscopic atypia; 3) Precursor lesions occurring with inherited hyperplastic syndromes that progress to cancer; 4) Acquired diffuse hyperplasias and diffuse metaplasias; 5) Currently unclassified entities; and 6) Superclass and modifiers. CONCLUSION: This work represents the first attempt to create a comprehensive listing of the precancers, the first attempt to classify precancers by their biological properties and the first attempt to create a pathologic classification of precancers using standard metadata (XML). The classification is placed in the public domain, and comment is invited by the authors, who are prepared to curate and modify the classification.


Asunto(s)
Informática Médica/métodos , Lesiones Precancerosas/clasificación , Unified Medical Language System , Sistemas de Apoyo a Decisiones Clínicas , Humanos , Internet , Lesiones Precancerosas/diagnóstico , Lenguajes de Programación , Unified Medical Language System/tendencias
20.
BMC Med Inform Decis Mak ; 3: 5, 2003 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-12769826

RESUMEN

BACKGROUND: Tissue Microarrays (TMAs) allow researchers to examine hundreds of small tissue samples on a single glass slide. The information held in a single TMA slide may easily involve Gigabytes of data. To benefit from TMA technology, the scientific community needs an open source TMA data exchange specification that will convey all of the data in a TMA experiment in a format that is understandable to both humans and computers. A data exchange specification for TMAs allows researchers to submit their data to journals and to public data repositories and to share or merge data from different laboratories. In May 2001, the Association of Pathology Informatics (API) hosted the first in a series of four workshops, co-sponsored by the National Cancer Institute, to develop an open, community-supported TMA data exchange specification. METHODS: A draft tissue microarray data exchange specification was developed through workshop meetings. The first workshop confirmed community support for the effort and urged the creation of an open XML-based specification. This was to evolve in steps with approval for each step coming from the stakeholders in the user community during open workshops. By the fourth workshop, held October, 2002, a set of Common Data Elements (CDEs) was established as well as a basic strategy for organizing TMA data in self-describing XML documents. RESULTS: The TMA data exchange specification is a well-formed XML document with four required sections: 1) Header, containing the specification Dublin Core identifiers, 2) Block, describing the paraffin-embedded array of tissues, 3)Slide, describing the glass slides produced from the Block, and 4) Core, containing all data related to the individual tissue samples contained in the array. Eighty CDEs, conforming to the ISO-11179 specification for data elements constitute XML tags used in the TMA data exchange specification. A set of six simple semantic rules describe the complete data exchange specification. Anyone using the data exchange specification can validate their TMA files using a software implementation written in Perl and distributed as a supplemental file with this publication. CONCLUSION: The TMA data exchange specification is now available in a draft form with community-approved Common Data Elements and a community-approved general file format and data structure. The specification can be freely used by the scientific community. Efforts sponsored by the Association for Pathology Informatics to refine the draft TMA data exchange specification are expected to continue for at least two more years. The interested public is invited to participate in these open efforts. Information on future workshops will be posted at http://www.pathologyinformatics.org (API we site).


Asunto(s)
Servicios de Salud Comunitaria/normas , Técnicas de Apoyo para la Decisión , Perfilación de la Expresión Génica/normas , Internet/tendencias , Informática Médica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Especificidad de Órganos/genética , Bases de Datos Genéticas/normas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA