Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 37(23): 4589-4590, 2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-34601554

RESUMEN

SUMMARY: Cytogenetics data, or karyotypes, are among the most common clinically used forms of genetic data. Karyotypes are stored as standardized text strings using the International System for Human Cytogenomic Nomenclature (ISCN). Historically, these data have not been used in large-scale computational analyses due to limitations in the ISCN text format and structure. Recently developed computational tools such as CytoGPS have enabled large-scale computational analyses of karyotypes. To further enable such analyses, we have now developed RCytoGPS, an R package that takes JSON files generated from CytoGPS.org and converts them into objects in R. This conversion facilitates the analysis and visualizations of karyotype data. In effect this tool streamlines the process of performing large-scale karyotype analyses, thus advancing the field of computational cytogenetic pathology. AVAILABILITY AND IMPLEMENTATION: Freely available at https://CRAN.R-project.org/package=RCytoGPS. The code for the underlying CytoGPS software can be found at https://github.com/i2-wustl/CytoGPS.


Asunto(s)
Lectura , Programas Informáticos , Humanos , Cariotipificación , Cariotipo
2.
Bioinformatics ; 37(17): 2780-2781, 2021 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-33515233

RESUMEN

SUMMARY: Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize high-dimensional data based on these distances can elucidate subtypes and interactions within multi-dimensional and high-throughput data. However, researchers can select from a vast number of distance metrics and visualizations, each with their own strengths and weaknesses. The Mercator R package facilitates selection of a biologically meaningful distance from 10 metrics, together appropriate for binary, categorical and continuous data, and visualization with 5 standard and high-dimensional graphics tools. Mercator provides a user-friendly pipeline for informaticians or biologists to perform unsupervised analyses, from exploratory pattern recognition to production of publication-quality graphics. AVAILABILITYAND IMPLEMENTATION: Mercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html).

3.
BMC Bioinformatics ; 22(1): 100, 2021 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-33648439

RESUMEN

BACKGROUND: There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes by transforming them into a binary model. However, such advances are accompanied by new problems of data sparsity, heterogeneity, and noisiness that are magnified by the large-scale multidimensional nature of the data. To address these problems, we developed the Mercator R package, which processes and visualizes binary biomedical data. We use Mercator to address biomedical questions of cytogenetic patterns relating to lymphoid hematologic malignancies, which include a broad set of leukemias and lymphomas. Karyotype data are one of the most common form of genetic data collected on lymphoid malignancies, because karyotyping is part of the standard of care in these cancers. RESULTS: In this paper we combine the analytic power of CytoGPS and Mercator to perform a large-scale multidimensional pattern recognition study on 22,741 karyotype samples in 47 different hematologic malignancies obtained from the public Mitelman database. CONCLUSION: Our findings indicate that Mercator was able to identify both known and novel cytogenetic patterns across different lymphoid malignancies, furthering our understanding of the genetics of these diseases.


Asunto(s)
Enfermedades Hematológicas , Cariotipificación , Neoplasias , Aberraciones Cromosómicas , Humanos , Cariotipo
4.
J Biomed Inform ; 118: 103788, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33862229

RESUMEN

INTRODUCTION: Clustering analyses in clinical contexts hold promise to improve the understanding of patient phenotype and disease course in chronic and acute clinical medicine. However, work remains to ensure that solutions are rigorous, valid, and reproducible. In this paper, we evaluate best practices for dissimilarity matrix calculation and clustering on mixed-type, clinical data. METHODS: We simulate clinical data to represent problems in clinical trials, cohort studies, and EHR data, including single-type datasets (binary, continuous, categorical) and 4 data mixtures. We test 5 single distance metrics (Jaccard, Hamming, Gower, Manhattan, Euclidean) and 3 mixed distance metrics (DAISY, Supersom, and Mercator) with 3 clustering algorithms (hierarchical (HC), k-medoids, self-organizing maps (SOM)). We quantitatively and visually validate by Adjusted Rand Index (ARI) and silhouette width (SW). We applied our best methods to two real-world data sets: (1) 21 features collected on 247 patients with chronic lymphocytic leukemia, and (2) 40 features collected on 6000 patients admitted to an intensive care unit. RESULTS: HC outperformed k-medoids and SOM by ARI across data types. DAISY produced the highest mean ARI for mixed data types for all mixtures except unbalanced mixtures dominated by continuous data. Compared to other methods, DAISY with HC uncovered superior, separable clusters in both real-world data sets. DISCUSSION: Selecting an appropriate mixed-type metric allows the investigator to obtain optimal separation of patient clusters and get maximum use of their data. Superior metrics for mixed-type data handle multiple data types using multiple, type-focused distances. Better subclassification of disease opens avenues for targeted treatments, precision medicine, clinical decision support, and improved patient outcomes.


Asunto(s)
Leucemia Linfocítica Crónica de Células B , Algoritmos , Análisis por Conglomerados , Simulación por Computador , Humanos
5.
Bioinformatics ; 35(17): 2924-2931, 2019 09 01.
Artículo en Inglés | MEDLINE | ID: mdl-30689715

RESUMEN

MOTIVATION: Clonal heterogeneity is common in many types of cancer, including chronic lymphocytic leukemia (CLL). Previous research suggests that the presence of multiple distinct cancer clones is associated with clinical outcome. Detection of clonal heterogeneity from high throughput data, such as sequencing or single nucleotide polymorphism (SNP) array data, is important for gaining a better understanding of cancer and may improve prediction of clinical outcome or response to treatment. Here, we present a new method, CloneSeeker, for inferring clinical heterogeneity from sequencing data, SNP array data, or both. RESULTS: We generated simulated SNP array and sequencing data and applied CloneSeeker along with two other methods. We demonstrate that CloneSeeker is more accurate than existing algorithms at determining the number of clones, distribution of cancer cells among clones, and mutation and/or copy numbers belonging to each clone. Next, we applied CloneSeeker to SNP array data from samples of 258 previously untreated CLL patients to gain a better understanding of the characteristics of CLL tumors and to elucidate the relationship between clonal heterogeneity and clinical outcome. We found that a significant majority of CLL patients appear to have multiple clones distinguished by copy number alterations alone. We also found that the presence of multiple clones corresponded with significantly worse survival among CLL patients. These findings may prove useful for improving the accuracy of prognosis and design of treatment strategies. AVAILABILITY AND IMPLEMENTATION: Code available on R-Forge: https://r-forge.r-project.org/projects/CloneSeeker/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Leucemia Linfocítica Crónica de Células B , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma , Algoritmos , Variaciones en el Número de Copia de ADN , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Masculino
6.
Bioinformatics ; 35(24): 5365-5366, 2019 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-31263896

RESUMEN

SUMMARY: Karyotype data are the most common form of genetic data that is regularly used clinically. They are collected as part of the standard of care in many diseases, particularly in pediatric and cancer medicine contexts. Karyotypes are represented in a unique text-based format, with a syntax defined by the International System for human Cytogenetic Nomenclature (ISCN). While human-readable, ISCN is not intrinsically machine-readable. This limitation has prevented the full use of complex karyotype data in discovery science use cases. To enhance the utility and value of karyotype data, we developed a tool named CytoGPS. CytoGPS first parses ISCN karyotypes into a machine-readable format. It then converts the ISCN karyotype into a binary Loss-Gain-Fusion (LGF) model, which represents all cytogenetic abnormalities as combinations of loss, gain, or fusion events, in a format that is analyzable using modern computational methods. Such data is then made available for comprehensive 'downstream' analyses that previously were not feasible. AVAILABILITY AND IMPLEMENTATION: Freely available at http://cytogps.org.


Asunto(s)
Aberraciones Cromosómicas , Cariotipo , Humanos , Cariotipificación , Neoplasias , Programas Informáticos
7.
Lancet Oncol ; 20(11): 1576-1586, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31582354

RESUMEN

BACKGROUND: Fludarabine, cyclophosphamide, and rituximab (FCR) has become a gold-standard chemoimmunotherapy regimen for patients with chronic lymphocytic leukaemia. However, the question remains of how to treat treatment-naive patients with IGHV-unmutated chronic lymphocytic leukaemia. We therefore aimed to develop and validate a gene expression signature to identify which of these patients are likely to achieve durable remissions with FCR chemoimmunotherapy. METHODS: We did a retrospective cohort study in two cohorts of treatment-naive patients (aged ≥18 years) with chronic lymphocytic leukaemia. The discovery and training cohort consisted of peripheral blood samples collected from patients treated at the University of Texas MD Anderson Cancer Center (Houston, TX, USA), who fulfilled the diagnostic criteria of the International Workshop on Chronic Lymphocytic Leukemia, had received at least three cycles of FCR chemoimmunotherapy, and had been treated between Oct 10, 2000, and Oct 26, 2006 (ie, the MDACC cohort). We did transcriptional profiling on samples obtained from the MDACC cohort to identify genes associated with time to progression. We did univariate Cox proportional hazards analyses and used significant genes to cluster IGHV-unmutated samples into two groups (intermediate prognosis and unfavourable prognosis). After using cross-validation to assess robustness, we applied the Lasso method to standardise the gene expression values to find a minimum gene signature. We validated this signature in an external cohort of treatment-naive patients with IGHV-unmutated chronic lymphocytic leukaemia enrolled on the CLL8 trial of the German Chronic Lymphocytic Leukaemia Study Group who were treated between July 21, 2003, and April 4, 2006 (ie, the CLL8 cohort). FINDINGS: The MDACC cohort consisted of 101 patients and the CLL8 cohort consisted of 109 patients. Using the MDACC cohort, we identified and developed a 17-gene expression signature that distinguished IGHV-unmutated patients who were likely to achieve a long-term remission following front-line FCR chemoimmunotherapy from those who might benefit from alternative front-line regimens (hazard ratio 3·83, 95% CI 1·94-7·59; p<0·0001). We validated this gene signature in the CLL8 cohort; patients with an unfavourable prognosis versus those with an intermediate prognosis had a cause-specific hazard ratio of 1·90 (95% CI 1·18-3·06; p=0·008). Median time to progression was 39 months (IQR 22-69) for those with an unfavourable prognosis compared with 59 months (28-84) for those with an intermediate prognosis. INTERPRETATION: We have developed a robust, reproducible 17-gene signature that identifies a subset of treatment-naive patients with IGHV-unmutated chronic lymphocytic leukaemia who might substantially benefit from treatment with FCR chemoimmunotherapy. We recommend testing the value of this gene signature in a prospective study that compares FCR treatment with newer alternative therapies as part of a randomised clinical trial. FUNDING: Chronic Lymphocytic Leukaemia Global Research Foundation and the National Institutes of Health/National Cancer Institute.


Asunto(s)
Antineoplásicos Inmunológicos/administración & dosificación , Protocolos de Quimioterapia Combinada Antineoplásica/administración & dosificación , Ciclofosfamida/administración & dosificación , Perfilación de la Expresión Génica , Leucemia Linfocítica Crónica de Células B/tratamiento farmacológico , Rituximab/administración & dosificación , Transcriptoma , Vidarabina/análogos & derivados , Anciano , Antineoplásicos Inmunológicos/efectos adversos , Protocolos de Quimioterapia Combinada Antineoplásica/efectos adversos , Ciclofosfamida/efectos adversos , Progresión de la Enfermedad , Femenino , Alemania , Humanos , Leucemia Linfocítica Crónica de Células B/genética , Leucemia Linfocítica Crónica de Células B/inmunología , Leucemia Linfocítica Crónica de Células B/patología , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Inducción de Remisión , Medición de Riesgo , Factores de Riesgo , Rituximab/efectos adversos , Texas , Factores de Tiempo , Resultado del Tratamiento , Vidarabina/administración & dosificación , Vidarabina/efectos adversos
8.
BMC Bioinformatics ; 20(Suppl 24): 679, 2019 Dec 20.
Artículo en Inglés | MEDLINE | ID: mdl-31861985

RESUMEN

BACKGROUND: RNA sequencing technologies have allowed researchers to gain a better understanding of how the transcriptome affects disease. However, sequencing technologies often unintentionally introduce experimental error into RNA sequencing data. To counteract this, normalization methods are standardly applied with the intent of reducing the non-biologically derived variability inherent in transcriptomic measurements. However, the comparative efficacy of the various normalization techniques has not been tested in a standardized manner. Here we propose tests that evaluate numerous normalization techniques and applied them to a large-scale standard data set. These tests comprise a protocol that allows researchers to measure the amount of non-biological variability which is present in any data set after normalization has been performed, a crucial step to assessing the biological validity of data following normalization. RESULTS: In this study we present two tests to assess the validity of normalization methods applied to a large-scale data set collected for systematic evaluation purposes. We tested various RNASeq normalization procedures and concluded that transcripts per million (TPM) was the best performing normalization method based on its preservation of biological signal as compared to the other methods tested. CONCLUSION: Normalization is of vital importance to accurately interpret the results of genomic and transcriptomic experiments. More work, however, needs to be performed to optimize normalization methods for RNASeq data. The present effort helps pave the way for more systematic evaluations of normalization methods across different platforms. With our proposed schema researchers can evaluate their own or future normalization methods to further improve the field of RNASeq normalization.


Asunto(s)
ARN/genética , Análisis de Secuencia de ARN/métodos , Genoma , Genómica , Humanos , Transcriptoma
9.
BMC Bioinformatics ; 19(1): 9, 2018 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-29310570

RESUMEN

BACKGROUND: Cluster analysis is the most common unsupervised method for finding hidden groups in data. Clustering presents two main challenges: (1) finding the optimal number of clusters, and (2) removing "outliers" among the objects being clustered. Few clustering algorithms currently deal directly with the outlier problem. Furthermore, existing methods for identifying the number of clusters still have some drawbacks. Thus, there is a need for a better algorithm to tackle both challenges. RESULTS: We present a new approach, implemented in an R package called Thresher, to cluster objects in general datasets. Thresher combines ideas from principal component analysis, outlier filtering, and von Mises-Fisher mixture models in order to select the optimal number of clusters. We performed a large Monte Carlo simulation study to compare Thresher with other methods for detecting outliers and determining the number of clusters. We found that Thresher had good sensitivity and specificity for detecting and removing outliers. We also found that Thresher is the best method for estimating the optimal number of clusters when the number of objects being clustered is smaller than the number of variables used for clustering. Finally, we applied Thresher and eleven other methods to 25 sets of breast cancer data downloaded from the Gene Expression Omnibus; only Thresher consistently estimated the number of clusters to lie in the range of 4-7 that is consistent with the literature. CONCLUSIONS: Thresher is effective at automatically detecting and removing outliers. By thus cleaning the data, it produces better estimates of the optimal number of clusters when there are more variables than objects. When we applied Thresher to a variety of breast cancer datasets, it produced estimates that were both self-consistent and consistent with the literature. We expect Thresher to be useful for studying a wide variety of biological datasets.


Asunto(s)
Análisis por Conglomerados , Algoritmos , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , Femenino , Humanos , Método de Montecarlo , Análisis de Componente Principal
10.
BMC Genomics ; 19(1): 738, 2018 Oct 11.
Artículo en Inglés | MEDLINE | ID: mdl-30305013

RESUMEN

BACKGROUND: Transcription factors are essential regulators of gene expression and play critical roles in development, differentiation, and in many cancers. To carry out their regulatory programs, they must cooperate in networks and bind simultaneously to sites in promoter or enhancer regions of genes. We hypothesize that the mRNA co-expression patterns of transcription factors can be used both to learn how they cooperate in networks and to distinguish between cancer types. RESULTS: We recently developed a new algorithm, Thresher, that combines principal component analysis, outlier filtering, and von Mises-Fisher mixture models to cluster genes (in this case, transcription factors) based on expression, determining the optimal number of clusters in the process. We applied Thresher to the RNA-Seq expression data of 486 transcription factors from more than 10,000 samples of 33 kinds of cancer studied in The Cancer Genome Atlas (TCGA). We found that 30 clusters of transcription factors from a 29-dimensional principal component space were able to distinguish between most cancer types, and could separate tumor samples from normal controls. Moreover, each cluster of transcription factors could be either (i) linked to a tissue-specific expression pattern or (ii) associated with a fundamental biological process such as cell cycle, angiogenesis, apoptosis, or cytoskeleton. Clusters of the second type were more likely also to be associated with embryonically lethal mouse phenotypes. CONCLUSIONS: Using our approach, we have shown that the mRNA expression patterns of transcription factors contain most of the information needed to distinguish different cancer types. The Thresher method is capable of discovering biologically interpretable clusters of genes. It can potentially be applied to other gene sets, such as signaling pathways, to decompose them into simpler, yet biologically meaningful, components.


Asunto(s)
Biología Computacional , Neoplasias/clasificación , Neoplasias/metabolismo , Factores de Transcripción/metabolismo , Análisis por Conglomerados , Perfilación de la Expresión Génica , Neoplasias/genética , Análisis de Componente Principal
11.
J Surg Oncol ; 118(3): 501-509, 2018 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-30132912

RESUMEN

BACKGROUND AND OBJECTIVES: MicroRNAs (miRs) are noncoding RNAs that regulate protein translation and melanoma progression. Changes in plasma miR expression following surgical resection of metastatic melanoma are under-investigated. We hypothesize differences in miR expression exist following complete surgical resection of metastatic melanoma. METHODS: Blood collection pre- and post-surgical resection was performed in six individuals with solitary melanoma metastases. miR expression in extracted RNA was quantified using the NanoString nCounter Digital Analyzer. RESULTS: Pre- and post-surgical plasma samples contained 216 miRs with expression above baseline. Comparison of postsurgical to preresection samples revealed differential expression of 25 miRs: miR-let-7a, miR-let7g, miR-15a, miR-16, miR-22, miR-30b, miR-126, miR-140, miR-145, miR-148a, miR-150-5p, miR-191, miR-378i, miR-449c, miR-494, miR-513b, miR-548aa, miR-571, miR-587, miR-891b, miR-1260a, miR 1268a, miR-1976, miR-4268, miR-4454 (P < 0.05). Utilizing P < 0.0046 as a cutoff to control for one false positive among the 216 miRs revealed that postsurgical melanoma plasma samples had upregulation of miR-1260a (P = 0.0007) and downregulation of miR-150-5p (P = 0.0026) relative to pre-surgical samples. CONCLUSIONS: Differential expression of miR-150-5p and miR-1260a is present in plasma following surgical resection of metastatic melanoma in this small sample (n = 6) of melanoma patients. Therefore, further investigation of these plasma miRs as noninvasive biomarkers for melanoma is warranted.


Asunto(s)
Regulación Neoplásica de la Expresión Génica , Melanoma/genética , MicroARNs/genética , Recurrencia Local de Neoplasia/genética , Anciano , Biomarcadores de Tumor , Femenino , Estudios de Seguimiento , Perfilación de la Expresión Génica , Humanos , Metástasis Linfática , Masculino , Melanoma/secundario , Melanoma/cirugía , Persona de Mediana Edad , Recurrencia Local de Neoplasia/patología , Recurrencia Local de Neoplasia/cirugía , Pronóstico , Tasa de Supervivencia
12.
J Neurovirol ; 23(5): 671-678, 2017 10.
Artículo en Inglés | MEDLINE | ID: mdl-28695489

RESUMEN

The relationship between human cytomegalovirus (HCMV) and glioblastoma (GBM) is an ongoing debate with extensive evidence supporting or refuting its existence through molecular assays, pre-clinical studies, and clinical trials. We focus primarily on the crux of the debate, detection of HCMV in GBM samples using molecular assays. We propose that these differences in detection could be affected by cellular heterogeneity. To take this into account, we align the single-cell RNA sequencing (scRNA-seq) reads from five GBM tumors and two cell lines to HCMV and analyze the alignments for evidence of (i) complete viral transcripts and (ii) low-abundance viral reads. We found that neither tumor nor cell line samples showed conclusive evidence of full HCMV viral transcripts. We also identified low-abundance reads aligned across all tumors, with two tumors having higher alignment rates than the rest of the tumor samples. This work is meant to rigorously test for HCMV RNA expression at a single cell level in GBM samples and examine the possible utility of single cell data in tumor virology.


Asunto(s)
Neoplasias Encefálicas/virología , Infecciones por Citomegalovirus/complicaciones , Glioblastoma/virología , Línea Celular Tumoral , Citomegalovirus , Humanos , ARN Viral/análisis
13.
BMC Genomics ; 17 Suppl 7: 513, 2016 08 22.
Artículo en Inglés | MEDLINE | ID: mdl-27556157

RESUMEN

BACKGROUND: Somatic mutations can be used as potential biomarkers for subtyping and predicting outcomes for cancer patients. However, cancer patients often carry many somatic mutations, which do not always concentrate on specific genomic loci, suggesting that the mutations may affect common pathways or gene interaction networks instead of common genes. The challenge is thus to identify the functional relationships among the mutations using multi-modal data. We developed a novel approach for integrating patient somatic mutation, transcriptome and clinical data to mine underlying functional gene groups that can be used to stratify cancer patients into groups with different clinical outcomes. Specifically, we use distance correlation metric to mine the correlations between expression profiles of mutated genes from different patients. RESULTS: With this approach, we were able to cluster patients based on the functional relationships between the affected genes using their expression profiles, and to visualize the results using multi-dimensional scaling. Interestingly, we identified a stable subgroup of breast cancer patients that are highly enriched with ER-negative and triple-negative subtypes, and the somatic mutation genes they harbor were capable of acting as potential biomarkers to predict patient survival in several different breast cancer datasets, especially in ER-negative cohorts which has lacked reliable biomarkers. CONCLUSIONS: Our method provides a novel and promising approach for integrating genotyping and gene expression data in patient stratification in complex diseases.


Asunto(s)
Neoplasias de la Mama/genética , Biología Computacional/métodos , Genómica/métodos , Transcriptoma/genética , Algoritmos , Neoplasias de la Mama/patología , Bases de Datos Genéticas , Femenino , Perfilación de la Expresión Génica/métodos , Regulación Neoplásica de la Expresión Génica , Genotipo , Humanos , Modelos Teóricos , Mutación/genética
14.
Cancer Immunol Immunother ; 64(2): 149-59, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25305035

RESUMEN

Elevated levels of myeloid-derived suppressor cells (MDSCs) induced by tumor-derived factors are associated with inhibition of immune responses in patients with gastrointestinal malignancies. We hypothesized that pro-MDSC cytokines and levels of MDSC in the peripheral blood would be elevated in pancreatic adenocarcinoma patients with progressive disease. Peripheral blood mononuclear cells (PBMCs) were isolated from 16 pancreatic cancer patients undergoing chemotherapy and phenotyped for MDSC using a five antigen panel (CD33, HLA-DR, CD11b, CD14, CD15). Patients with stable disease had significantly lower MDSC levels in the peripheral blood than those with progressive disease (1.41 ± 1.12 vs. 5.14 ± 4.58 %, p = 0.013, Wilcoxon test). A cutoff of 2.5 % MDSC identified patients with progressive disease. Patients with ECOG performance status ≥2 had a weaker association with increased levels of MDSC. Plasma was obtained from 15 chemonaive patients, 13 patients undergoing chemotherapy and 9 normal donors. Increases in the levels of pro-MDSC cytokines were observed for pancreatic cancer patients versus controls, and the pro-MDSC cytokine IL-6 was increased in those patients undergoing chemotherapy. This study suggests that MDSC in peripheral blood may be a predictive biomarker of chemotherapy failure in pancreatic cancer patients.


Asunto(s)
Adenocarcinoma/inmunología , Adenocarcinoma/patología , Células Mieloides/inmunología , Neoplasias Pancreáticas/inmunología , Neoplasias Pancreáticas/patología , Adenocarcinoma/tratamiento farmacológico , Adenocarcinoma/metabolismo , Anciano , Anciano de 80 o más Años , Antígenos de Superficie/metabolismo , Recuento de Células , Factores Quimiotácticos/sangre , Factores Quimiotácticos/metabolismo , Citocinas/sangre , Citocinas/metabolismo , Progresión de la Enfermedad , Femenino , Antígenos HLA-DR/inmunología , Antígenos HLA-DR/metabolismo , Humanos , Inmunofenotipificación , Masculino , Persona de Mediana Edad , Células Mieloides/metabolismo , Estadificación de Neoplasias , Neoplasias Pancreáticas/tratamiento farmacológico , Neoplasias Pancreáticas/metabolismo , Transducción de Señal
16.
PLoS One ; 19(6): e0300358, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38848330

RESUMEN

Clustering is an important task in biomedical science, and it is widely believed that different data sets are best clustered using different algorithms. When choosing between clustering algorithms on the same data set, reseachers typically rely on global measures of quality, such as the mean silhouette width, and overlook the fine details of clustering. However, the silhouette width actually computes scores that describe how well each individual element is clustered. Inspired by this observation, we developed a novel clustering method, called SillyPutty. Unlike existing methods, SillyPutty uses the silhouette width for individual elements as a tool to optimize the mean silhouette width. This shift in perspective allows for a more granular evaluation of clustering quality, potentially addressing limitations in current methodologies. To test the SillyPutty algorithm, we first simulated a series of data sets using the Umpire R package and then used real-workd data from The Cancer Genome Atlas. Using these data sets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed. Availability: The SillyPutty R package can be downloaded from the Comprehensive R Archive Network (CRAN).


Asunto(s)
Algoritmos , Análisis por Conglomerados , Humanos , Neoplasias/patología , Programas Informáticos
17.
bioRxiv ; 2024 Apr 06.
Artículo en Inglés | MEDLINE | ID: mdl-37808763

RESUMEN

Objective: Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, and two rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods: Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13,646 records for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, medspaCy and scispaCy by comparing precision, recall, and micro-F1 scores. Results: GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, medspaCy and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT and Flan-T5 models were not constrained by explicit rule requirements for contextual pattern recognition. SpaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion: GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.

18.
Am J Bot ; 100(1): 194-202, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23281391

RESUMEN

PREMISE: Plant organs use gravity as a guide to direct their growth. And although gravitropism has been studied since the time of Darwin, the mechanisms of signal transduction, those that connect the biophysical stimulus perception and the biochemical events of the response, are still not understood. METHODS: A quantitative proteomics approach was used to identify key proteins during the early events of gravitropism. Plants were subjected to a gravity persistent signal (GPS) treatment, and proteins were extracted from the inflorescence stem at early time points after stimulation. Proteins were labeled with isobaric tags for relative and absolute quantification (iTRAQ) reagents. Proteins were identified and quantified as a single step using tandem mass-spectrometry (MS/MS). For two of the proteins identified, mutants with T-DNA inserts in the corresponding genes were evaluated for gravitropic phenotypes. KEY RESULTS: A total of 82 proteins showed significant differential quantification between treatment and controls. Proteins were categorized into functional groups based on gene ontology terms and filtered using groups thought to be involved in the signaling events of gravitropism. For two of the proteins selected, GSTF9 and HSP81-2, knockout mutations resulted in defects in root skewing, waving, and curvature as well as in the GPS response of inflorescence stems. CONCLUSION: Combining a proteomics approach with the GPS response, 82 novel proteins were identified to be involved in the early events of gravitropic signal transduction. As early as 2 and 4 min after a gravistimulation, significant changes occur in protein abundance. The approach was validated through the analysis of mutants exhibiting altered gravitropic responses.


Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/fisiología , Gravitropismo/fisiología , Proteómica/métodos , Transducción de Señal , Arabidopsis/genética , Análisis Mutacional de ADN , Cinética , Anotación de Secuencia Molecular , Mutación/genética , Fenotipo , Raíces de Plantas/fisiología , Reproducibilidad de los Resultados , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
19.
bioRxiv ; 2023 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-37131792

RESUMEN

Gene regulatory networks play a critical role in understanding cell states, gene expression, and biological processes. Here, we investigated the utility of transcription factors (TFs) and microRNAs (miRNAs) in creating a low-dimensional representation of cell states and predicting gene expression across 31 cancer types. We identified 28 clusters of miRNAs and 28 clusters of TFs, demonstrating that they can differentiate tissue of origin. Using a simple SVM classifier, we achieved an average accuracy of 92.8% in tissue classification. We also predicted the entire transcriptome using Tissue-Agnostic and Tissue-Aware models, with average R2 values of 0.45 and 0.70, respectively. Our Tissue-Aware model, using 56 selected features, showed comparable predictive power to the widely-used L1000 genes. However, the model's transportability was impacted by covariate shift, particularly inconsistent microRNA expression across datasets.

20.
bioRxiv ; 2023 Nov 11.
Artículo en Inglés | MEDLINE | ID: mdl-37986817

RESUMEN

Unsupervised clustering is an important task in biomedical science. We developed a new clustering method, called SillyPutty, for unsupervised clustering. As test data, we generated a series of datasets using the Umpire R package. Using these datasets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA