RESUMO
Keyphrase extraction is an important facet of annotation tools that offer the provision of the metadata necessary for technical language processing (TLP). Because TLP imposes additional requirements on typical natural language processing (NLP) methods, we examined TLP keyphrase extraction through the lens of a hypothetical toolkit which consists of a combination of text features and classifers suitable for use in low-resource TLP applications. We compared two approaches for keyphrase extraction: The frst which applied our toolkit-based methods that used only distributional features of words and phrases, and the second was the Maui automatic topic indexer, a well-known academic method. Performance was measured against two collections of technical literature: 1153 articles from Journal of Chemical Thermodynamics (JCT) curated by the National Institute of Standards and Technology Thermodynamics Research Center (TRC) and 244 articles from Task 5 of the Workshop on Semantic Evaluation (SemEval). Both collections have author-provided keyphrases available; the SemEval articles also have reader-provided keyphrases. Our fndings indicate that our toolkit approach was competitive with Maui when author-provided keyphrases were frst removed from the text. For the TRC-JCT articles, the Maui automatic topic indexer reported an F-measure of 29.4 % while our toolkit approach obtained an F-measure of 28.2 %. For the SemEval articles, our toolkit approach using a Naïve Bayes classifer resulted in an F-measure of 20.8 %, which outperformed Maui's F-measure of 18.8 %.
RESUMO
PURPOSE: To compare image resolution from iterative reconstruction with resolution from filtered back projection for low-contrast objects on phantom computed tomographic (CT) images across vendors and exposure levels. MATERIALS AND METHODS: Randomized repeat scans of an American College of Radiology CT accreditation phantom (module 2, low contrast) were performed for multiple radiation exposures, vendors, and vendor iterative reconstruction algorithms. Eleven volunteers were presented with 900 images by using a custom-designed graphical user interface to perform a task created specifically for this reader study. Results were analyzed by using statistical graphics and analysis of variance. RESULTS: Across three vendors (blinded as A, B, and C) and across three exposure levels, the mean correct classification rate was higher for iterative reconstruction than filtered back projection (P < .01): 87.4% iterative reconstruction and 81.3% filtered back projection at 20 mGy, 70.3% iterative reconstruction and 63.9% filtered back projection at 12 mGy, and 61.0% iterative reconstruction and 56.4% filtered back projection at 7.2 mGy. There was a significant difference in mean correct classification rate between vendor B and the other two vendors. Across all exposure levels, images obtained by using vendor B's scanner outperformed the other vendors, with a mean correct classification rate of 74.4%, while the mean correct classification rate for vendors A and C was 68.1% and 68.3%, respectively. Across all readers, the mean correct classification rate for iterative reconstruction (73.0%) was higher compared with the mean correct classification rate for filtered back projection (67.0%). CONCLUSION: The potential exists to reduce radiation dose without compromising low-contrast detectability by using iterative reconstruction instead of filtered back projection. There is substantial variability across vendor reconstruction algorithms.
Assuntos
Processamento de Imagem Assistida por Computador , Imagens de Fantasmas , Exposição à Radiação , Tomógrafos Computadorizados , Tomografia Computadorizada por Raios XRESUMO
PURPOSE: To develop and validate a metric of computed tomographic (CT) image quality that incorporates the noise texture and resolution properties of an image. MATERIALS AND METHODS: Images of the American College of Radiology CT quality assurance phantom were acquired by using three commercial CT systems at seven dose levels with filtered back projection (FBP) and iterative reconstruction (IR). Image quality was characterized by the contrast-to-noise ratio (CNR) and a detectability index (d') that incorporated noise texture and spatial resolution. The measured CNR and d' were compared with a corresponding observer study by using the Spearman rank correlation coefficient to determine how well each metric reflects the ability of an observer to detect subtle lesions. Statistical significance of the correlation between each metric and observer performance was determined by using a Student t distribution; P values less than .05 indicated a significant correlation. Additionally, each metric was used to estimate the dose reduction potential of IR algorithms while maintaining image quality. RESULTS: Across all dose levels, scanner models, and reconstruction algorithms, the d' correlated strongly with observer performance in the corresponding observer study (ρ = 0.95; P < .001), whereas the CNR correlated weakly with observer performance (ρ = 0.31; P = .21). Furthermore, the d' showed that the dose-reduction capabilities differed between clinical implementations (range, 12%-35%) and were less than those predicted from the CNR (range, 50%-54%). CONCLUSION: The strong correlation between the observer performance and the d' indicates that the d' is superior to the CNR for the evaluation of CT image quality. Moreover, the results of this study indicate that the d' improves less than the CNR with the use of IR, which indicates less potential for IR dose reduction than previously thought.
Assuntos
Processamento de Imagem Assistida por Computador , Análise e Desempenho de Tarefas , Tomografia Computadorizada por Raios X/normas , Desenho de Equipamento , Razão Sinal-Ruído , Tomografia Computadorizada por Raios X/instrumentaçãoRESUMO
BACKGROUND: Many cell lines currently used in medical research, such as cancer cells or stem cells, grow in confluent sheets or colonies. The biology of individual cells provide valuable information, thus the separation of touching cells in these microscopy images is critical for counting, identification and measurement of individual cells. Over-segmentation of single cells continues to be a major problem for methods based on morphological watershed due to the high level of noise in microscopy cell images. There is a need for a new segmentation method that is robust over a wide variety of biological images and can accurately separate individual cells even in challenging datasets such as confluent sheets or colonies. RESULTS: We present a new automated segmentation method called FogBank that accurately separates cells when confluent and touching each other. This technique is successfully applied to phase contrast, bright field, fluorescence microscopy and binary images. The method is based on morphological watershed principles with two new features to improve accuracy and minimize over-segmentation. First, FogBank uses histogram binning to quantize pixel intensities which minimizes the image noise that causes over-segmentation. Second, FogBank uses a geodesic distance mask derived from raw images to detect the shapes of individual cells, in contrast to the more linear cell edges that other watershed-like algorithms produce. We evaluated the segmentation accuracy against manually segmented datasets using two metrics. FogBank achieved segmentation accuracy on the order of 0.75 (1 being a perfect match). We compared our method with other available segmentation techniques in term of achieved performance over the reference data sets. FogBank outperformed all related algorithms. The accuracy has also been visually verified on data sets with 14 cell lines across 3 imaging modalities leading to 876 segmentation evaluation images. CONCLUSIONS: FogBank produces single cell segmentation from confluent cell sheets with high accuracy. It can be applied to microscopy images of multiple cell lines and a variety of imaging modalities. The code for the segmentation method is available as open-source and includes a Graphical User Interface for user friendly execution.
Assuntos
Algoritmos , Células/citologia , Biologia Computacional/métodos , Interpretação de Imagem Assistida por Computador/métodos , Microscopia de Fluorescência/métodos , Microscopia de Contraste de Fase/métodos , Animais , Mama/citologia , Feminino , Humanos , Camundongos , Células NIH 3T3 , Saccharomyces cerevisiae/citologiaRESUMO
The analysis of fluorescence microscopy of cells often requires the determination of cell edges. This is typically done using segmentation techniques that separate the cell objects in an image from the surrounding background. This study compares segmentation results from nine different segmentation techniques applied to two different cell lines and five different sets of imaging conditions. Significant variability in the results of segmentation was observed that was due solely to differences in imaging conditions or applications of different algorithms. We quantified and compared the results with a novel bivariate similarity index metric that evaluates the degree of underestimating or overestimating a cell object. The results show that commonly used threshold-based segmentation techniques are less accurate than k-means clustering with multiple clusters. Segmentation accuracy varies with imaging conditions that determine the sharpness of cell edges and with geometric features of a cell. Based on this observation, we propose a method that quantifies cell edge character to provide an estimate of how accurately an algorithm will perform. The results of this study will assist the development of criteria for evaluating interlaboratory comparability.
Assuntos
Algoritmos , Células/citologia , Aumento da Imagem/métodos , Interpretação de Imagem Assistida por Computador/métodos , Microscopia de Fluorescência/métodos , Animais , Camundongos , RatosRESUMO
The extracellular matrix protein tenascin-C plays a critical role in development, wound healing, and cancer progression, but how it is controlled and how it exerts its physiological responses remain unclear. By quantifying the behavior of live cells with phase contrast and fluorescence microscopy, the dynamic regulation of TN-C promoter activity is examined. We employ an NIH 3T3 cell line stably transfected with the TN-C promoter ligated to the gene sequence for destabilized green fluorescent protein (GFP). Fully automated image analysis routines, validated by comparison with data derived from manual segmentation and tracking of single cells, are used to quantify changes in the cellular GFP in hundreds of individual cells throughout their cell cycle during live cell imaging experiments lasting 62 h. We find that individual cells vary substantially in their expression patterns over the cell cycle, but that on average TN-C promoter activity increases during the last 40% of the cell cycle. We also find that the increase in promoter activity is proportional to the activity earlier in the cell cycle. This work illustrates the application of live cell microscopy and automated image analysis of a promoter-driven GFP reporter cell line to identify subtle gene regulatory mechanisms that are difficult to uncover using population averaged measurements.
Assuntos
Ciclo Celular/genética , Processamento de Imagem Assistida por Computador/métodos , Regiões Promotoras Genéticas , Tenascina/genética , Animais , Regulação da Expressão Gênica , Proteínas de Fluorescência Verde/genética , Proteínas de Fluorescência Verde/metabolismo , Camundongos , Microscopia de Fluorescência , Microscopia de Contraste de Fase , Células NIH 3T3 , Tenascina/metabolismoRESUMO
Despite recent dramatic successes, Natural Language Processing (NLP) is not ready to address a variety of real-world problems. Its reliance on large standard corpora, a training and evaluation paradigm that favors the learning of shallow heuristics, and large computational resource requirements, makes domain-specific application of even the most successful NLP techniques difficult. This paper proposes Technical Language Processing (TLP) which brings engineering principles and practices to NLP specifically for the purpose of extracting actionable information from language generated by experts in their technical tasks, systems, and processes. TLP envisages NLP as a socio-technical system rather than as an algorithmic pipeline. We describe how the TLP approach to meaning and generalization differs from that of NLP, how data quantity and quality can be addressed in engineering technical domains, and the potential risks of not adapting NLP for technical use cases. Engineering problems can benefit immensely from the inclusion of knowledge from unstructured data, currently unavailable due to issues with out of the box NLP packages. We illustrate the TLP approach by focusing on maintenance in industrial organizations as a case-study.
RESUMO
As a result of a number of national initiatives, we are seeing rapid growth in the data important to materials science that are available over the web. Consequently, it is becoming increasingly difficult for researchers to learn what data are available and how to access them. To address this problem, the Research Data Alliance (RDA) Working Group for International Materials Science Registries (IMRR) was established to bring together materials science and information technology experts to develop an international federation of registries that can be used for global discovery of data resources for materials science. A resource registry collects high-level metadata descriptions of resources such as data repositories, archives, websites, and services that are useful for data-driven research. By making the collection searchable, it aids scientists in industry, universities, and government laboratories to discover data relevant to their research and work interests. We present the results of our successful piloting of a registry federation for materials science data discovery. In particular, we out a blueprint for creating such a federation that is capable of amassing a global view of all available materials science data, and we enumerate the requirements for the standards that make the registries interoperable within the federation. These standards include a protocol for exchanging resource descriptions and a standard metadata schema for encoding those descriptions. We summarize how we leveraged an existing standard (OAI-PMH) for metadata exchange. Finally, we review the registry software developed to realize the federation and describe the user experience.
RESUMO
In order to facilitate the extraction of quantitative data from live cell image sets, automated image analysis methods are needed. This paper presents an introduction to the general principle of an overlap cell tracking software developed by the National Institute of Standards and Technology (NIST). This cell tracker has the ability to track cells across a set of time lapse images acquired at high rates based on the amount of overlap between cellular regions in consecutive frames. It is designed to be highly flexible, requires little user parameterization, and has a fast execution time.
RESUMO
We present a case study in which we use natural language processing and machine learning techniques to automatically select candidate scientific articles that may contain new experimental thermophysical property data from thousands of articles available in five different relevant journals. The National Institute of Standards and Technology (NIST) Thermodynamic Research Center (TRC) maintains a large database of available thermophysical property data extracted from articles that are manually selected for content. Over time the number of articles requiring manual inspection has grown and assistance from machine-based methods is needed. Previous work used topic modeling along with classification techniques to classify these journal articles into those with data for the TRC database and those without. These techniques have produced classifications with accuracy between 85 % and 90 %. However, the TRC does not want to lose data from the misclassified articles that contain relevant information. In this study, we start with these topic modeling and classification techniques, and then enhance the model using information relevant to the TRC's selection process. Our goal is to minimize the number of articles that require manual selection without missing articles of importance. Through a series of selection methods, we eliminate those articles for which we can determine a rejection criterion. We can reduce the number of articles that are not of interest by 70.8 % while retaining 98.7 % of the articles of interest. We have also found that topic model classification improves when the corpus of words is derived from specific sections of the articles rather than the entire articles, and we improve on our classification by using a combination of topic models from different sections of the article. Our best classification used only the Experimental and Literature Cited sections.
RESUMO
The ability to accurately track cells and particles from images is critical to many biomedical problems. To address this, we developed Lineage Mapper, an open-source tracker for time-lapse images of biological cells, colonies, and particles. Lineage Mapper tracks objects independently of the segmentation method, detects mitosis in confluence, separates cell clumps mistakenly segmented as a single cell, provides accuracy and scalability even on terabyte-sized datasets, and creates division and/or fusion lineages. Lineage Mapper has been tested and validated on multiple biological and simulated problems. The software is available in ImageJ and Matlab at isg.nist.gov.
Assuntos
Linhagem da Célula/fisiologia , Mitose/fisiologia , Processamento de Imagem Assistida por Computador , SoftwareRESUMO
RATIONALE AND OBJECTIVES: Quantifying changes in lung tumor volume is important for diagnosis, therapy planning, and evaluation of response to therapy. The aim of this study was to assess the performance of multiple algorithms on a reference data set. The study was organized by the Quantitative Imaging Biomarker Alliance (QIBA). MATERIALS AND METHODS: The study was organized as a public challenge. Computed tomography scans of synthetic lung tumors in an anthropomorphic phantom were acquired by the Food and Drug Administration. Tumors varied in size, shape, and radiodensity. Participants applied their own semi-automated volume estimation algorithms that either did not allow or allowed post-segmentation correction (type 1 or 2, respectively). Statistical analysis of accuracy (percent bias) and precision (repeatability and reproducibility) was conducted across algorithms, as well as across nodule characteristics, slice thickness, and algorithm type. RESULTS: Eighty-four percent of volume measurements of QIBA-compliant tumors were within 15% of the true volume, ranging from 66% to 93% across algorithms, compared to 61% of volume measurements for all tumors (ranging from 37% to 84%). Algorithm type did not affect bias substantially; however, it was an important factor in measurement precision. Algorithm precision was notably better as tumor size increased, worse for irregularly shaped tumors, and on the average better for type 1 algorithms. Over all nodules meeting the QIBA Profile, precision, as measured by the repeatability coefficient, was 9.0% compared to 18.4% overall. CONCLUSION: The results achieved in this study, using a heterogeneous set of measurement algorithms, support QIBA quantitative performance claims in terms of volume measurement repeatability for nodules meeting the QIBA Profile criteria.