Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Mil Med ; 188(Suppl 6): 590-597, 2023 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-37948284

RESUMEN

INTRODUCTION: Foot and ankle fractures are the most common military health problem. Automated diagnosis can save time and personnel. It is crucial to distinguish fractures not only from normal healthy cases, but also robust against the presence of other orthopedic pathologies. Artificial intelligence (AI) deep learning has been shown to be promising. Previously, we have developed HAMIL-Net to automatically detect orthopedic injuries for upper extremity injuries. In this research, we investigated the performance of HAMIL-Net for detecting foot and ankle fractures in the presence of other abnormalities. MATERIALS AND METHODS: HAMIL-Net is a novel deep neural network consisting of a hierarchical attention layer followed by a multiple-instance learning layer. The design allowed it to deal with imaging studies with multiple views. We used 148K musculoskeletal imaging studies for 51K Veterans at VA San Diego in the past 20 years to create datasets for this research. We annotated each study by a semi-automated pipeline leveraging radiology reports written by board-certified radiologists and extracting findings with a natural language processing tool and manually validated the annotations. RESULTS: HAMIL-Net can be trained with study-level, multiple-view examples, and detect foot and ankle fractures with a 0.87 area under the receiver operational curve, but the performance dropped when tested by cases including other abnormalities. By integrating a fracture specialized model with one that detecting a broad range of abnormalities, HAMIL-Net's accuracy of detecting any abnormality improved from 0.53 to 0.77 and F-score from 0.46 to 0.86. We also reported HAMIL-Net's performance under different study types including for young (age 18-35) patients. CONCLUSIONS: Automated fracture detection is promising but to be deployed in clinical use, presence of other abnormalities must be considered to deliver its full benefit. Our results with HAMIL-Net showed that considering other abnormalities improved fracture detection and allowed for incidental findings of other musculoskeletal abnormalities pertinent or superimposed on fractures.


Asunto(s)
Fracturas de Tobillo , Inteligencia Artificial , Humanos , Adolescente , Adulto Joven , Adulto , Redes Neurales de la Computación , Estudios Retrospectivos
2.
Radiol Artif Intell ; 4(4): e210258, 2022 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-35923376

RESUMEN

Purpose: To investigate if tailoring a transformer-based language model to radiology is beneficial for radiology natural language processing (NLP) applications. Materials and Methods: This retrospective study presents a family of bidirectional encoder representations from transformers (BERT)-based language models adapted for radiology, named RadBERT. Transformers were pretrained with either 2.16 or 4.42 million radiology reports from U.S. Department of Veterans Affairs health care systems nationwide on top of four different initializations (BERT-base, Clinical-BERT, robustly optimized BERT pretraining approach [RoBERTa], and BioMed-RoBERTa) to create six variants of RadBERT. Each variant was fine-tuned for three representative NLP tasks in radiology: (a) abnormal sentence classification: models classified sentences in radiology reports as reporting abnormal or normal findings; (b) report coding: models assigned a diagnostic code to a given radiology report for five coding systems; and (c) report summarization: given the findings section of a radiology report, models selected key sentences that summarized the findings. Model performance was compared by bootstrap resampling with five intensively studied transformer language models as baselines: BERT-base, BioBERT, Clinical-BERT, BlueBERT, and BioMed-RoBERTa. Results: For abnormal sentence classification, all models performed well (accuracies above 97.5 and F1 scores above 95.0). RadBERT variants achieved significantly higher scores than corresponding baselines when given only 10% or less of 12 458 annotated training sentences. For report coding, all variants outperformed baselines significantly for all five coding systems. The variant RadBERT-BioMed-RoBERTa performed the best among all models for report summarization, achieving a Recall-Oriented Understudy for Gisting Evaluation-1 score of 16.18 compared with 15.27 by the corresponding baseline (BioMed-RoBERTa, P < .004). Conclusion: Transformer-based language models tailored to radiology had improved performance of radiology NLP tasks compared with baseline transformer language models.Keywords: Translation, Unsupervised Learning, Transfer Learning, Neural Networks, Informatics Supplemental material is available for this article. © RSNA, 2022See also commentary by Wiggins and Tejani in this issue.

3.
Nat Med ; 27(10): 1735-1743, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34526699

RESUMEN

Federated learning (FL) is a method used for training artificial intelligence models with data from multiple sources while maintaining data anonymity, thus removing many barriers to data sharing. Here we used data from 20 institutes across the globe to train a FL model, called EXAM (electronic medical record (EMR) chest X-ray AI model), that predicts the future oxygen requirements of symptomatic patients with COVID-19 using inputs of vital signs, laboratory data and chest X-rays. EXAM achieved an average area under the curve (AUC) >0.92 for predicting outcomes at 24 and 72 h from the time of initial presentation to the emergency room, and it provided 16% improvement in average AUC measured across all participating sites and an average increase in generalizability of 38% when compared with models trained at a single site using that site's data. For prediction of mechanical ventilation treatment or death at 24 h at the largest independent test site, EXAM achieved a sensitivity of 0.950 and specificity of 0.882. In this study, FL facilitated rapid data science collaboration without data exchange and generated a model that generalized across heterogeneous, unharmonized datasets for prediction of clinical outcomes in patients with COVID-19, setting the stage for the broader use of FL in healthcare.


Asunto(s)
COVID-19/fisiopatología , Aprendizaje Automático , Evaluación de Resultado en la Atención de Salud , COVID-19/terapia , COVID-19/virología , Registros Electrónicos de Salud , Humanos , Pronóstico , SARS-CoV-2/aislamiento & purificación
4.
PLoS Comput Biol ; 17(5): e1008967, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-34043624

RESUMEN

Antibodies are widely used reagents to test for expression of proteins and other antigens. However, they might not always reliably produce results when they do not specifically bind to the target proteins that their providers designed them for, leading to unreliable research results. While many proposals have been developed to deal with the problem of antibody specificity, it is still challenging to cover the millions of antibodies that are available to researchers. In this study, we investigate the feasibility of automatically generating alerts to users of problematic antibodies by extracting statements about antibody specificity reported in the literature. The extracted alerts can be used to construct an "Antibody Watch" knowledge base containing supporting statements of problematic antibodies. We developed a deep neural network system and tested its performance with a corpus of more than two thousand articles that reported uses of antibodies. We divided the problem into two tasks. Given an input article, the first task is to identify snippets about antibody specificity and classify if the snippets report that any antibody exhibits non-specificity, and thus is problematic. The second task is to link each of these snippets to one or more antibodies mentioned in the snippet. The experimental evaluation shows that our system can accurately perform the classification task with 0.925 weighted F1-score, linking with 0.962 accuracy, and 0.914 weighted F1 when combined to complete the joint task. We leveraged Research Resource Identifiers (RRID) to precisely identify antibodies linked to the extracted specificity snippets. The result shows that it is feasible to construct a reliable knowledge base about problematic antibodies by text mining.


Asunto(s)
Especificidad de Anticuerpos , Minería de Datos , Animales , Humanos , Ratones , Redes Neurales de la Computación
5.
Database (Oxford) ; 20212021 04 29.
Artículo en Inglés | MEDLINE | ID: mdl-33914028

RESUMEN

High-quality metadata annotations for data hosted in large public repositories are essential for research reproducibility and for conducting fast, powerful and scalable meta-analyses. Currently, a majority of sequencing samples in the National Center for Biotechnology Information's Sequence Read Archive (SRA) are missing metadata across several categories. In an effort to improve the metadata coverage of these samples, we leveraged almost 44 million attribute-value pairs from SRA BioSample to train a scalable, recurrent neural network that predicts missing metadata via named entity recognition (NER). The network was first trained to classify short text phrases according to 11 metadata categories and achieved an overall accuracy and area under the receiver operating characteristic curve of 85.2% and 0.977, respectively. We then applied our classifier to predict 11 metadata categories from the longer TITLE attribute of samples, evaluating performance on a set of samples withheld from model training. Prediction accuracies were high when extracting sample Genus/Species (94.85%), Condition/Disease (95.65%) and Strain (82.03%) from TITLEs, with lower accuracies and lack of predictions for other categories highlighting multiple issues with the current metadata annotations in BioSample. These results indicate the utility of recurrent neural networks for NER-based metadata prediction and the potential for models such as the one presented here to increase metadata coverage in BioSample while minimizing the need for manual curation. Database URL: https://github.com/cartercompbio/PredictMEE.


Asunto(s)
Aprendizaje Profundo , Metadatos , Secuenciación de Nucleótidos de Alto Rendimiento , Reproducibilidad de los Resultados , Programas Informáticos
6.
Res Sq ; 2021 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-33442676

RESUMEN

'Federated Learning' (FL) is a method to train Artificial Intelligence (AI) models with data from multiple sources while maintaining anonymity of the data thus removing many barriers to data sharing. During the SARS-COV-2 pandemic, 20 institutes collaborated on a healthcare FL study to predict future oxygen requirements of infected patients using inputs of vital signs, laboratory data, and chest x-rays, constituting the "EXAM" (EMR CXR AI Model) model. EXAM achieved an average Area Under the Curve (AUC) of over 0.92, an average improvement of 16%, and a 38% increase in generalisability over local models. The FL paradigm was successfully applied to facilitate a rapid data science collaboration without data exchange, resulting in a model that generalised across heterogeneous, unharmonized datasets. This provided the broader healthcare community with a validated model to respond to COVID-19 challenges, as well as set the stage for broader use of FL in healthcare.

7.
Microbiome ; 7(1): 129, 2019 09 05.
Artículo en Inglés | MEDLINE | ID: mdl-31488215

RESUMEN

The last few years have seen tremendous growth in human microbiome research, with a particular focus on the links to both mental and physical health and disease. Medical and experimental settings provide initial sources of information about these links, but individual studies produce disconnected pieces of knowledge bounded in context by the perspective of expert researchers reading full-text publications. Building a knowledge base (KB) consolidating these disconnected pieces is an essential first step to democratize and accelerate the process of accessing the collective discoveries of human disease connections to the human microbiome. In this article, we survey the existing tools and development efforts that have been produced to capture portions of the information needed to construct a KB of all known human microbiome-disease associations and highlight the need for additional innovations in natural language processing (NLP), text mining, taxonomic representations, and field-wide vocabulary standardization in human microbiome research. Addressing these challenges will enable the construction of KBs that help identify new insights amenable to experimental validation and potentially clinical decision support.


Asunto(s)
Minería de Datos , Bases del Conocimiento , Microbiota , Procesamiento de Lenguaje Natural , Humanos
8.
PLoS One ; 14(12): e0226771, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31891604

RESUMEN

We performed a hypothesis-generating phenome-wide association study (PheWAS) to identify and characterize cross-phenotype associations, where one SNP is associated with two or more phenotypes, between thousands of genetic variants assayed on the Metabochip and hundreds of phenotypes in 5,897 African Americans as part of the Population Architecture using Genomics and Epidemiology (PAGE) I study. The PAGE I study was a National Human Genome Research Institute-funded collaboration of four study sites accessing diverse epidemiologic studies genotyped on the Metabochip, a custom genotyping chip that has dense coverage of regions in the genome previously associated with cardio-metabolic traits and outcomes in mostly European-descent populations. Here we focus on identifying novel phenome-genome relationships, where SNPs are associated with more than one phenotype. To do this, we performed a PheWAS, testing each SNP on the Metabochip for an association with up to 273 phenotypes in the participating PAGE I study sites. We identified 133 putative pleiotropic variants, defined as SNPs associated at an empirically derived p-value threshold of p<0.01 in two or more PAGE study sites for two or more phenotype classes. We further annotated these PheWAS-identified variants using publicly available functional data and local genetic ancestry. Amongst our novel findings is SPARC rs4958487, associated with increased glucose levels and hypertension. SPARC has been implicated in the pathogenesis of diabetes and is also known to have a potential role in fibrosis, a common consequence of multiple conditions including hypertension. The SPARC example and others highlight the potential that PheWAS approaches have in improving our understanding of complex disease architecture by identifying novel relationships between genetic variants and an array of common human phenotypes.


Asunto(s)
Aterosclerosis/genética , Negro o Afroamericano/genética , Pleiotropía Genética , Metagenómica , Fenómica , Anciano , Estudios Epidemiológicos , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple
9.
J Biomed Inform ; 82: 63-69, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29679685

RESUMEN

BACKGROUND: Big clinical note datasets found in electronic health records (EHR) present substantial opportunities to train accurate statistical models that identify patterns in patient diagnosis and outcomes. However, near-to-exact duplication in note texts is a common issue in many clinical note datasets. We aimed to use a scalable algorithm to de-duplicate notes and further characterize the sources of duplication. METHODS: We use an approximation algorithm to minimize pairwise comparisons consisting of three phases: (1) Minhashing with Locality Sensitive Hashing; (2) a clustering method using tree-structured disjoint sets; and (3) classification of near-duplicates (exact copies, common machine output notes, or similar notes) via pairwise comparison of notes in each cluster. We use the Jaccard Similarity (JS) to measure similarity between two documents. We analyzed two big clinical note datasets: our institutional dataset and MIMIC-III. RESULTS: There were 1,528,940 notes analyzed from our institution. The de-duplication algorithm completed in 36.3 h. When the JS threshold was set at 0.7, the total number of clusters was 82,371 (total notes = 304,418). Among all JS thresholds, no clusters contained pairs of notes that were incorrectly clustered. When the JS threshold was set at 0.9 or 1.0, the de-duplication algorithm captured 100% of all random pairs with their JS at least as high as the set thresholds from the validation set. Similar performance was noted when analyzing the MIMIC-III dataset. CONCLUSIONS: We showed that among the EHR from our institution and from the publicly-available MIMIC-III dataset, there were a significant number of near-to-exact duplicated notes.


Asunto(s)
Recolección de Datos , Registros Electrónicos de Salud , Informática Médica/métodos , Algoritmos , Análisis por Conglomerados , Computadores , Bases de Datos Factuales , Conjuntos de Datos como Asunto , Humanos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Obesidad Mórbida/diagnóstico , Obesidad Mórbida/epidemiología , Reconocimiento de Normas Patrones Automatizadas
10.
AMIA Jt Summits Transl Sci Proc ; 2016: 225-34, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27570676

RESUMEN

Recommendation of related articles is an important feature of the PubMed. The PubMed Related Citations (PRC) algorithm is the engine that enables this feature, and it leverages information on 22 million citations. We analyzed the performance of the PRC algorithm on 4584 annotated articles from the 2005 Text REtrieval Conference (TREC) Genomics Track data. Our analysis indicated that the PRC highest weighted term was not always consistent with the critical term that was most directly related to the topic of the article. We implemented term expansion and found that it was a promising and easy-to-implement approach to improve the performance of the PRC algorithm for the TREC 2005 Genomics data and for the TREC 2014 Clinical Decision Support Track data. For term expansion, we trained a Skip-gram model using the Word2Vec package. This extended PRC algorithm resulted in higher average precision for a large subset of articles. A combination of both algorithms may lead to improved performance in related article recommendations.

12.
Acad Emerg Med ; 23(5): 628-36, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-26826020

RESUMEN

OBJECTIVE: Delayed diagnosis of Kawasaki disease (KD) may lead to serious cardiac complications. We sought to create and test the performance of a natural language processing (NLP) tool, the KD-NLP, in the identification of emergency department (ED) patients for whom the diagnosis of KD should be considered. METHODS: We developed an NLP tool that recognizes the KD diagnostic criteria based on standard clinical terms and medical word usage using 22 pediatric ED notes augmented by Unified Medical Language System vocabulary. With high suspicion for KD defined as fever and three or more KD clinical signs, KD-NLP was applied to 253 ED notes from children ultimately diagnosed with either KD or another febrile illness. We evaluated KD-NLP performance against ED notes manually reviewed by clinicians and compared the results to a simple keyword search. RESULTS: KD-NLP identified high-suspicion patients with a sensitivity of 93.6% and specificity of 77.5% compared to notes manually reviewed by clinicians. The tool outperformed a simple keyword search (sensitivity = 41.0%; specificity = 76.3%). CONCLUSIONS: KD-NLP showed comparable performance to clinician manual chart review for identification of pediatric ED patients with a high suspicion for KD. This tool could be incorporated into the ED electronic health record system to alert providers to consider the diagnosis of KD. KD-NLP could serve as a model for decision support for other conditions in the ED.


Asunto(s)
Servicio de Urgencia en Hospital , Síndrome Mucocutáneo Linfonodular/diagnóstico , Procesamiento de Lenguaje Natural , Niño , Minería de Datos/métodos , Registros Electrónicos de Salud , Humanos , Síndrome Mucocutáneo Linfonodular/terapia , Sensibilidad y Especificidad
13.
BMC Bioinformatics ; 17 Suppl 1: 1, 2016 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-26817711

RESUMEN

BACKGROUND: Numerous publicly available biomedical databases derive data by curating from literatures. The curated data can be useful as training examples for information extraction, but curated data usually lack the exact mentions and their locations in the text required for supervised machine learning. This paper describes a general approach to information extraction using curated data as training examples. The idea is to formulate the problem as cost-sensitive learning from noisy labels, where the cost is estimated by a committee of weak classifiers that consider both curated data and the text. RESULTS: We test the idea on two information extraction tasks of Genome-Wide Association Studies (GWAS). The first task is to extract target phenotypes (diseases or traits) of a study and the second is to extract ethnicity backgrounds of study subjects for different stages (initial or replication). Experimental results show that our approach can achieve 87% of Precision-at-2 (P@2) for disease/trait extraction, and 0.83 of F1-Score for stage-ethnicity extraction, both outperforming their cost-insensitive baseline counterparts. CONCLUSIONS: The results show that curated biomedical databases can potentially be reused as training examples to train information extractors without expert annotation or refinement, opening an unprecedented opportunity of using "big data" in biomedical text mining.


Asunto(s)
Indización y Redacción de Resúmenes/métodos , Curaduría de Datos , Minería de Datos/métodos , Bases de Datos Factuales , Enfermedad/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Medición de Riesgo
14.
AMIA Annu Symp Proc ; 2016: 1880-1889, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-28269947

RESUMEN

Natural Language Processing (NLP) is essential for concept extraction from narrative text in electronic health records (EHR). To extract numerous and diverse concepts, such as data elements (i.e., important concepts related to a certain medical condition), a plausible solution is to combine various NLP tools into an ensemble to improve extraction performance. However, it is unclear to what extent ensembles of popular NLP tools improve the extraction of numerous and diverse concepts. Therefore, we built an NLP ensemble pipeline to synergize the strength of popular NLP tools using seven ensemble methods, and to quantify the improvement in performance achieved by ensembles in the extraction of data elements for three very different cohorts. Evaluation results show that the pipeline can improve the performance of NLP tools, but there is high variability depending on the cohort.


Asunto(s)
Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Recolección de Datos , Humanos
15.
PLoS One ; 10(8): e0136631, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26317409

RESUMEN

The Protein Data Bank (PDB) is the worldwide repository of 3D structures of proteins, nucleic acids and complex assemblies. The PDB's large corpus of data (> 100,000 structures) and related citations provide a well-organized and extensive test set for developing and understanding data citation and access metrics. In this paper, we present a systematic investigation of how authors cite PDB as a data repository. We describe a novel metric based on information cascade constructed by exploring the citation network to measure influence between competing works and apply that to analyze different data citation practices to PDB. Based on this new metric, we found that the original publication of RCSB PDB in the year 2000 continues to attract most citations though many follow-up updates were published. None of these follow-up publications by members of the wwPDB organization can compete with the original publication in terms of citations and influence. Meanwhile, authors increasingly choose to use URLs of PDB in the text instead of citing PDB papers, leading to disruption of the growth of the literature citations. A comparison of data usage statistics and paper citations shows that PDB Web access is highly correlated with URL mentions in the text. The results reveal the trend of how authors cite a biomedical data repository and may provide useful insight of how to measure the impact of a data repository.


Asunto(s)
Bases de Datos de Proteínas , Humanos
16.
Circ Cardiovasc Genet ; 7(4): 505-13, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25023634

RESUMEN

BACKGROUND: Metabolic syndrome (MetS) refers to the clustering of cardiometabolic risk factors, including dyslipidemia, central adiposity, hypertension, and hyperglycemia, in individuals. Identification of pleiotropic genetic factors associated with MetS traits may shed light on key pathways or mediators underlying MetS. METHODS AND RESULTS: Using the Metabochip array in 15 148 African Americans from the Population Architecture using Genomics and Epidemiology (PAGE) study, we identify susceptibility loci and investigate pleiotropy among genetic variants using a subset-based meta-analysis method, ASsociation-analysis-based-on-subSETs (ASSET). Unlike conventional models that lack power when associations for MetS components are null or have opposite effects, Association-analysis-based-on-subsets uses 1-sided tests to detect positive and negative associations for components separately and combines tests accounting for correlations among components. With Association-analysis-based-on-subsets, we identify 27 single nucleotide polymorphisms in 1 glucose and 4 lipids loci (TCF7L2, LPL, APOA5, CETP, and APOC1/APOE/TOMM40) significantly associated with MetS components overall, all P<2.5e-7, the Bonferroni adjusted P value. Three loci replicate in a Hispanic population, n=5172. A novel African American-specific variant, rs12721054/APOC1, and rs10096633/LPL are associated with ≥3 MetS components. We find additional evidence of pleiotropy for APOE, TOMM40, TCF7L2, and CETP variants, many with opposing effects (eg, the same rs7901695/TCF7L2 allele is associated with increased odds of high glucose and decreased odds of central adiposity). CONCLUSIONS: We highlight a method to increase power in large-scale genomic association analyses and report a novel variant associated with all MetS components in African Americans. We also identify pleiotropic associations that may be clinically useful in patient risk profiling and for informing translational research of potential gene targets and medications.


Asunto(s)
Negro o Afroamericano/genética , Genómica , Síndrome Metabólico/genética , Anciano , Alelos , Apolipoproteína A-V , Apolipoproteína C-I/genética , Apolipoproteínas A/genética , Glucemia/análisis , Proteínas de Transferencia de Ésteres de Colesterol/genética , HDL-Colesterol/sangre , Femenino , Sitios Genéticos , Pleiotropía Genética , Predisposición Genética a la Enfermedad , Variación Genética , Genotipo , Hispánicos o Latinos/genética , Humanos , Lipoproteína Lipasa/genética , Masculino , Síndrome Metabólico/epidemiología , Persona de Mediana Edad , Oportunidad Relativa , Fenotipo , Polimorfismo de Nucleótido Simple , Proteína 2 Similar al Factor de Transcripción 7/genética
17.
PLoS One ; 8(12): e81434, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24353754

RESUMEN

Schizosaccharomyces pombe shares many genes and proteins with humans and is a good model for chromosome behavior and DNA dynamics, which can be analyzed by visualizing the behavior of fluorescently tagged proteins in vivo. Performing a genome-wide screen for changes in such proteins requires developing methods that automate analysis of a large amount of images, the first step of which requires robust segmentation of the cell. We developed a segmentation system, PombeX, that can segment cells from transmitted illumination images with focus gradient and varying contrast. Corrections for focus gradient are applied to the image to aid in accurate detection of cell membrane and cytoplasm pixels, which is used to generate initial contours for cells. Gradient vector flow snake evolution is used to obtain the final cell contours. Finally, a machine learning-based validation of cell contours removes most incorrect or spurious contours. Quantitative evaluations show overall good segmentation performance on a large set of images, regardless of differences in image quality, lighting condition, focus condition and phenotypic profile. Comparisons with recent related methods for yeast cells show that PombeX outperforms current methods, both in terms of segmentation accuracy and computational speed.


Asunto(s)
Proteínas Fúngicas/metabolismo , Regulación Fúngica de la Expresión Génica/fisiología , Procesamiento de Imagen Asistido por Computador/métodos , Schizosaccharomyces/citología , Programas Informáticos , Inteligencia Artificial , Compartimento Celular , Fluorescencia
18.
Mol Cell Proteomics ; 12(5): 1335-49, 2013 May.
Artículo en Inglés | MEDLINE | ID: mdl-23397142

RESUMEN

Deciphering the network of signaling pathways in cancer via protein-protein interactions (PPIs) at the cellular level is a promising approach but remains incomplete. We used an in situ proximity ligation assay to identify and quantify 67 endogenous PPIs among 21 interlinked pathways in two hepatocellular carcinoma (HCC) cells, Huh7 (minimally migratory cells) and Mahlavu (highly migratory cells). We then applied a differential network biology analysis and determined that the novel interaction, CRKL-FLT1, has a high centrality ranking, and the expression of this interaction is strongly correlated with the migratory ability of HCC and other cancer cell lines. Knockdown of CRKL and FLT1 in HCC cells leads to a decrease in cell migration via ERK signaling and the epithelial-mesenchymal transition process. Our immunohistochemical analysis shows high expression levels of the CRKL and CRKL-FLT1 pair that strongly correlate with reduced disease-free and overall survival in HCC patient samples, and a multivariate analysis further established CRKL and the CRKL-FLT1 as novel prognosis markers. This study demonstrated that functional exploration of a disease network with interlinked pathways via PPIs can be used to discover novel biomarkers.


Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/metabolismo , Biomarcadores de Tumor/metabolismo , Carcinoma Hepatocelular/metabolismo , Neoplasias Hepáticas/metabolismo , Proteínas Nucleares/metabolismo , Mapas de Interacción de Proteínas , Adulto , Anciano , Anciano de 80 o más Años , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/mortalidad , Supervivencia sin Enfermedad , Células HEK293 , Células Hep G2 , Humanos , Estimación de Kaplan-Meier , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/mortalidad , Persona de Mediana Edad , Pronóstico , Modelos de Riesgos Proporcionales , Estudios Retrospectivos , Transducción de Señal , Análisis de Matrices Tisulares , Receptor 1 de Factores de Crecimiento Endotelial Vascular/metabolismo , Adulto Joven
19.
BMC Proc ; 6 Suppl 7: S3, 2012 Nov 13.
Artículo en Inglés | MEDLINE | ID: mdl-23173775

RESUMEN

BACKGROUND: Decades of genome-wide association studies (GWAS) have accumulated large volumes of genomic data that can potentially be reused to increase statistical power of new studies, but different genotyping platforms with different marker sets have been used as biotechnology has evolved, preventing pooling and comparability of old and new data. For example, to pool together data collected by 550K chips with newer data collected by 900K chips, we will need to impute missing loci. Many imputation algorithms have been developed, but the posteriori probabilities estimated by those algorithms are not a reliable measure the quality of the imputation. Recently, many studies have used an imputation quality score (IQS) to measure the quality of imputation. The IQS requires to know true alleles to estimate. Only when the population and the imputation loci are identical can we reuse the estimated IQS when the true alleles are unknown. METHODS: Here, we present a regression model to estimate IQS that learns from imputation of loci with known alleles. We designed a small set of features, such as minor allele frequencies, distance to the nearest known cross-over hotspot, etc., for the prediction of IQS. We evaluated our regression models by estimating IQS of imputations by BEAGLE for a set of GWAS data from the NCBI GEO database collected from samples from different ethnic populations. RESULTS: We construct a ν-SVR based approach as our regression model. Our evaluation shows that this regression model can accomplish mean square errors of less than 0.02 and a correlation coefficient close to 0.75 in different imputation scenarios. We also show how the regression results can help remove false positives in association studies. CONCLUSION: Reliable estimation of IQS will facilitate integration and reuse of existing genomic data for meta-analysis and secondary analysis. Experiments show that it is possible to use a small number of features to regress the IQS by learning from different training examples of imputation and IQS pairs.

20.
Genomics ; 100(3): 141-8, 2012 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22735742

RESUMEN

Recent genome-wide surveys on ncRNA have revealed that a substantial fraction of miRNA genes is likely to form clusters. However, the evolutionary and biological function implications of clustered miRNAs are still elusive. After identifying clustered miRNA genes under different maximum inter-miRNA distances (MIDs), this study intended to reveal evolution conservation patterns among these clustered miRNA genes in metazoan species using a computation algorithm. As examples, a total of 15-35% of known and predicted miRNA genes in nine selected species constitute clusters under the MIDs ranging from 1kb to 50kb. Intriguingly, 33 out of 37 metazoan miRNA clusters in 56 metazoan genomes are co-conserved with their up/down-stream adjacent protein-coding genes. Meanwhile, a co-expression pattern of miR-1 and miR-133a in the mir-133-1 cluster has been experimentally demonstrated. Therefore, the MetaMirClust database provides a useful bioinformatic resource for biologists to facilitate the advanced interrogations on the composition of miRNA clusters and their evolution patterns.


Asunto(s)
Minería de Datos/métodos , MicroARNs/análisis , Familia de Multigenes , Programas Informáticos , Algoritmos , Animales , Secuencia de Bases , Biología Computacional/métodos , Secuencia Conservada , Bases de Datos Genéticas , Evolución Molecular , Genes de ARNr , Células Hep G2 , Humanos , MicroARNs/genética , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Ribosomas/genética , Homología de Secuencia de Ácido Nucleico , Transcriptoma
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...