Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 193
Filtrar
1.
Cancers (Basel) ; 16(15)2024 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-39123390

RESUMEN

Patients are complex and heterogeneous; clinical data sets are complicated by noise, missing data, and the presence of mixed-type data. Using such data sets requires understanding the high-dimensional "space of patients", composed of all measurements that define all relevant phenotypes. The current state-of-the-art merely defines spatial groupings of patients using cluster analyses. Our goal is to apply topological data analysis (TDA), a new unsupervised technique, to obtain a more complete understanding of patient space. We applied TDA to a space of 266 previously untreated patients with Chronic Lymphocytic Leukemia (CLL), using the "daisy" metric to compute distances between clinical records. We found clear evidence for both loops and voids in the CLL data. To interpret these structures, we developed novel computational and graphical methods. The most persistent loop and the most persistent void can be explained using three dichotomized, prognostically important factors in CLL: IGHV somatic mutation status, beta-2 microglobulin, and Rai stage. In conclusion, patient space turns out to be richer and more complex than current models suggest. TDA could become a powerful tool in a researcher's arsenal for interpreting high-dimensional data by providing novel insights into biological processes and improving our understanding of clinical and biological data sets.

2.
PLoS One ; 19(6): e0300358, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38848330

RESUMEN

Clustering is an important task in biomedical science, and it is widely believed that different data sets are best clustered using different algorithms. When choosing between clustering algorithms on the same data set, reseachers typically rely on global measures of quality, such as the mean silhouette width, and overlook the fine details of clustering. However, the silhouette width actually computes scores that describe how well each individual element is clustered. Inspired by this observation, we developed a novel clustering method, called SillyPutty. Unlike existing methods, SillyPutty uses the silhouette width for individual elements as a tool to optimize the mean silhouette width. This shift in perspective allows for a more granular evaluation of clustering quality, potentially addressing limitations in current methodologies. To test the SillyPutty algorithm, we first simulated a series of data sets using the Umpire R package and then used real-workd data from The Cancer Genome Atlas. Using these data sets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed. Availability: The SillyPutty R package can be downloaded from the Comprehensive R Archive Network (CRAN).


Asunto(s)
Algoritmos , Análisis por Conglomerados , Humanos , Neoplasias/patología , Programas Informáticos
3.
Ann Surg ; 2024 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-38771951

RESUMEN

OBJECTIVE: We aimed to assess the levels of MDM2-DNA within extracellular vesicles (EVs) isolated from the serum of retroperitoneal liposarcoma (RLS) patients versus healthy donors, as well as within the same patients at the time of surgery versus post-operative surveillance visits. To determine whether EV-MDM2 may serve as a possible first-ever biomarker of liposarcoma recurrence. BACKGROUND: A hallmark of well-differentiated and de-differentiated (WD/DD) retroperitoneal liposarcoma is elevated MDM2 due to genome amplification, with recurrence rates of >50% even after complete resection. Imaging technologies frequently cannot resolve recurrent WD/DD-RLS versus postoperative scarring. Early detection of recurrent lesions, for which biomarkers are lacking, would guide surveillance and treatment decisions. METHODS: WD/DD-RLS serum samples were collected both at the time of surgery and during follow-up visits from 42 patients, along with sera from healthy donors (n=14). EVs were isolated, DNA purified and MDM2-DNA levels determined through q-PCR analysis. Non-parametric tests were employed to compare EV-MDM2 DNA levels from patients versus control group, as well as the time of surgery versus post-surgery conditions. RESULTS: EV-MDM2 levels were significantly higher in WD/DD-RLS than controls (P= 0.00085). Moreover, EV-MDM2 levels were remarkably decreased in WD/DD-RLS patients after resection (P=0.00036), reaching values comparable to control group (P=0.124). During post-operative surveillance, significant increases of EV-MDM2 was observed in some patients, correlating with CT scan evidence of recurrent or persistent post-resection disease. CONCLUSIONS: Serum EV-MDM2 may serve as a potential biomarker of early recurrent or post-operatively persistent WD/DD-RLS, a disease currently lacking such determinants.

4.
bioRxiv ; 2023 Nov 11.
Artículo en Inglés | MEDLINE | ID: mdl-37986817

RESUMEN

Unsupervised clustering is an important task in biomedical science. We developed a new clustering method, called SillyPutty, for unsupervised clustering. As test data, we generated a series of datasets using the Umpire R package. Using these datasets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed.

5.
bioRxiv ; 2023 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-37131792

RESUMEN

Gene regulatory networks play a critical role in understanding cell states, gene expression, and biological processes. Here, we investigated the utility of transcription factors (TFs) and microRNAs (miRNAs) in creating a low-dimensional representation of cell states and predicting gene expression across 31 cancer types. We identified 28 clusters of miRNAs and 28 clusters of TFs, demonstrating that they can differentiate tissue of origin. Using a simple SVM classifier, we achieved an average accuracy of 92.8% in tissue classification. We also predicted the entire transcriptome using Tissue-Agnostic and Tissue-Aware models, with average R2 values of 0.45 and 0.70, respectively. Our Tissue-Aware model, using 56 selected features, showed comparable predictive power to the widely-used L1000 genes. However, the model's transportability was impacted by covariate shift, particularly inconsistent microRNA expression across datasets.

6.
Blood ; 142(1): 44-61, 2023 07 06.
Artículo en Inglés | MEDLINE | ID: mdl-37023372

RESUMEN

In chronic lymphocytic leukemia (CLL), epigenetic alterations are considered to centrally shape the transcriptional signatures that drive disease evolution and underlie its biological and clinical subsets. Characterizations of epigenetic regulators, particularly histone-modifying enzymes, are very rudimentary in CLL. In efforts to establish effectors of the CLL-associated oncogene T-cell leukemia 1A (TCL1A), we identified here the lysine-specific histone demethylase KDM1A to interact with the TCL1A protein in B cells in conjunction with an increased catalytic activity of KDM1A. We demonstrate that KDM1A is upregulated in malignant B cells. Elevated KDM1A and associated gene expression signatures correlated with aggressive disease features and adverse clinical outcomes in a large prospective CLL trial cohort. Genetic Kdm1a knockdown in Eµ-TCL1A mice reduced leukemic burden and prolonged animal survival, accompanied by upregulated p53 and proapoptotic pathways. Genetic KDM1A depletion also affected milieu components (T, stromal, and monocytic cells), resulting in significant reductions in their capacity to support CLL-cell survival and proliferation. Integrated analyses of differential global transcriptomes (RNA sequencing) and H3K4me3 marks (chromatin immunoprecipitation sequencing) in Eµ-TCL1A vs iKdm1aKD;Eµ-TCL1A mice (confirmed in human CLL) implicate KDM1A as an oncogenic transcriptional repressor in CLL which alters histone methylation patterns with pronounced effects on defined cell death and motility pathways. Finally, pharmacologic KDM1A inhibition altered H3K4/9 target methylation and revealed marked anti-B-cell leukemic synergisms. Overall, we established the pathogenic role and effector networks of KDM1A in CLL via tumor-cell intrinsic mechanisms and its impacts in cells of the microenvironment. Our data also provide rationales to further investigate therapeutic KDM1A targeting in CLL.


Asunto(s)
Leucemia Linfocítica Crónica de Células B , Humanos , Ratones , Animales , Leucemia Linfocítica Crónica de Células B/tratamiento farmacológico , Histonas/metabolismo , Lisina , Estudios Prospectivos , Histona Demetilasas/genética , Histona Demetilasas/metabolismo , Microambiente Tumoral
7.
Cancer Discov ; 13(4): 910-927, 2023 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-36715691

RESUMEN

The human papillomavirus (HPV) genome is integrated into host DNA in most HPV-positive cancers, but the consequences for chromosomal integrity are unknown. Continuous long-read sequencing of oropharyngeal cancers and cancer cell lines identified a previously undescribed form of structural variation, "heterocateny," characterized by diverse, interrelated, and repetitive patterns of concatemerized virus and host DNA segments within a cancer. Unique breakpoints shared across structural variants facilitated stepwise reconstruction of their evolution from a common molecular ancestor. This analysis revealed that virus and virus-host concatemers are unstable and, upon insertion into and excision from chromosomes, facilitate capture, amplification, and recombination of host DNA and chromosomal rearrangements. Evidence of heterocateny was detected in extrachromosomal and intrachromosomal DNA. These findings indicate that heterocateny is driven by the dynamic, aberrant replication and recombination of an oncogenic DNA virus, thereby extending known consequences of HPV integration to include promotion of intratumoral heterogeneity and clonal evolution. SIGNIFICANCE: Long-read sequencing of HPV-positive cancers revealed "heterocateny," a previously unreported form of genomic structural variation characterized by heterogeneous, interrelated, and repetitive genomic rearrangements within a tumor. Heterocateny is driven by unstable concatemerized HPV genomes, which facilitate capture, rearrangement, and amplification of host DNA, and promotes intratumoral heterogeneity and clonal evolution. See related commentary by McBride and White, p. 814. This article is highlighted in the In This Issue feature, p. 799.


Asunto(s)
Neoplasias Orofaríngeas , Infecciones por Papillomavirus , Humanos , Virus del Papiloma Humano , Reordenamiento Génico , Evolución Clonal/genética , Integración Viral/genética , Papillomaviridae/genética
8.
Chem Biodivers ; 19(11): e202200657, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36216587

RESUMEN

We present a novel model of time-series analysis to learn from electronic health record (EHR) data when infection occurred in the intensive care unit (ICU) by translating methods from proteomics and Bayesian statistics. Using 48,536 patients hospitalized in an ICU, we describe each hospital course as an 'alphabet' of 23 physician actions ('events') in temporal order. We analyze these as k-mers of length 3-12 events and apply a Bayesian model of (cumulative) relative risk (RR). The log2-transformed RR (median=0.248, mean=0.226) supported the conclusion that the events selected were individually associated with increased risk of infection. Selecting from all possible cutoffs of maximum gain (MG), MG>0.0244 predicts administration of antibiotics with PPV 82.0 %, NPV 44.4 %, and AUC 0.706. Our approach holds value for retrospective analysis of other clinical syndromes for which time-of-onset is critical to analysis but poorly marked in EHRs, including delirium and decompensation.


Asunto(s)
Registros Electrónicos de Salud , Unidades de Cuidados Intensivos , Humanos , Estudios Retrospectivos , Teorema de Bayes
9.
Bioinformatics ; 38(23): 5245-5252, 2022 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-36250792

RESUMEN

MOTIVATION: Clustered regularly interspaced short palindromic repeats (CRISPR)-based genetic perturbation screen is a powerful tool to probe gene function. However, experimental noises, especially for the lowly expressed genes, need to be accounted for to maintain proper control of false positive rate. METHODS: We develop a statistical method, named CRISPR screen with Expression Data Analysis (CEDA), to integrate gene expression profiles and CRISPR screen data for identifying essential genes. CEDA stratifies genes based on expression level and adopts a three-component mixture model for the log-fold change of single-guide RNAs (sgRNAs). Empirical Bayesian prior and expectation-maximization algorithm are used for parameter estimation and false discovery rate inference. RESULTS: Taking advantage of gene expression data, CEDA identifies essential genes with higher expression. Compared to existing methods, CEDA shows comparable reliability but higher sensitivity in detecting essential genes with moderate sgRNA fold change. Therefore, using the same CRISPR data, CEDA generates an additional hit gene list. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Genes Esenciales , Teorema de Bayes , Sistemas CRISPR-Cas , Expresión Génica , Reproducibilidad de los Resultados , ARN Pequeño no Traducido/genética
10.
Learn Health Syst ; 6(4): e10336, 2022 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-36263259

RESUMEN

Introduction: Applied health informatics infrastructure is a requirement for learning health systems and it is imperative that we train a workforce that can support this infrastructure. Our department offers courses in several interdisciplinary programs with topics ranging from bioinformatics to population health informatics. Due to changes in the field and our faculty members, we sought to assess our courses relevant to applied health informatics. Methods: In this paper, we discuss the three-phase evaluation of our program and include the survey we developed to identify the skills and knowledge base of our faculty. Results: We show how this assessment allowed us to identify gaps and develop strategies for program expansion. Conclusions: A focus on workforce development can help to guide and focus curricular review in an interdisciplinary graduate program.

11.
Comput Syst Oncol ; 2(2)2022 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-35966389

RESUMEN

Cancer progression, including the development of intratumor heterogeneity, is inherently a spatial process. Mathematical models of tumor evolution may be a useful starting point for understanding the patterns of heterogeneity that can emerge in the presence of spatial growth. A commonly studied spatial growth model assumes that tumor cells occupy sites on a lattice and replicate into neighboring sites. Our R package SITH provides a convenient interface for exploring this model. Our efficient simulation algorithm allows for users to generate 3D tumors with millions of cells in under a minute. For visualizing the distribution of mutations throughout the tumor, SITH provides interactive graphics and summary plots. Additionally, SITH can produce synthetic bulk and single-cell DNA-seq datasets by sampling from the simulated tumor. A streamlined API makes SITH a useful tool for investigating the relationship between spatial growth and intratumor heterogeneity. SITH is a part of CRAN and can be installed by running install.packages("SITH") from the R console. See https://CRAN.R-project.org/package=SITH for the user manual and package vignette.

12.
Genome Res ; 32(1): 55-70, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34903527

RESUMEN

Human papillomavirus (HPV) causes 5% of all cancers and frequently integrates into host chromosomes. The HPV oncoproteins E6 and E7 are necessary but insufficient for cancer formation, indicating that additional secondary genetic events are required. Here, we investigate potential oncogenic impacts of virus integration. Analysis of 105 HPV-positive oropharyngeal cancers by whole-genome sequencing detects virus integration in 77%, revealing five statistically significant sites of recurrent integration near genes that regulate epithelial stem cell maintenance (i.e., SOX2, TP63, FGFR, MYC) and immune evasion (i.e., CD274). Genomic copy number hyperamplification is enriched 16-fold near HPV integrants, and the extent of focal host genomic instability increases with their local density. The frequency of genes expressed at extreme outlier levels is increased 86-fold within ±150 kb of integrants. Across 95% of tumors with integration, host gene transcription is disrupted via intragenic integrants, chimeric transcription, outlier expression, gene breaking, and/or de novo expression of noncoding or imprinted genes. We conclude that virus integration can contribute to carcinogenesis in a large majority of HPV-positive oropharyngeal cancers by inducing extensive disruption of host genome structure and gene expression.


Asunto(s)
Alphapapillomavirus , Proteínas Oncogénicas Virales , Neoplasias Orofaríngeas , Alphapapillomavirus/metabolismo , Carcinogénesis , Humanos , Proteínas Oncogénicas Virales/genética , Neoplasias Orofaríngeas/genética , Papillomaviridae/genética , Papillomaviridae/metabolismo , Proteínas E7 de Papillomavirus/genética , Proteínas E7 de Papillomavirus/metabolismo , Integración Viral/genética
13.
Bioinformatics ; 37(23): 4589-4590, 2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-34601554

RESUMEN

SUMMARY: Cytogenetics data, or karyotypes, are among the most common clinically used forms of genetic data. Karyotypes are stored as standardized text strings using the International System for Human Cytogenomic Nomenclature (ISCN). Historically, these data have not been used in large-scale computational analyses due to limitations in the ISCN text format and structure. Recently developed computational tools such as CytoGPS have enabled large-scale computational analyses of karyotypes. To further enable such analyses, we have now developed RCytoGPS, an R package that takes JSON files generated from CytoGPS.org and converts them into objects in R. This conversion facilitates the analysis and visualizations of karyotype data. In effect this tool streamlines the process of performing large-scale karyotype analyses, thus advancing the field of computational cytogenetic pathology. AVAILABILITY AND IMPLEMENTATION: Freely available at https://CRAN.R-project.org/package=RCytoGPS. The code for the underlying CytoGPS software can be found at https://github.com/i2-wustl/CytoGPS.


Asunto(s)
Lectura , Programas Informáticos , Humanos , Cariotipificación , Cariotipo
14.
J Biomed Inform ; 118: 103788, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33862229

RESUMEN

INTRODUCTION: Clustering analyses in clinical contexts hold promise to improve the understanding of patient phenotype and disease course in chronic and acute clinical medicine. However, work remains to ensure that solutions are rigorous, valid, and reproducible. In this paper, we evaluate best practices for dissimilarity matrix calculation and clustering on mixed-type, clinical data. METHODS: We simulate clinical data to represent problems in clinical trials, cohort studies, and EHR data, including single-type datasets (binary, continuous, categorical) and 4 data mixtures. We test 5 single distance metrics (Jaccard, Hamming, Gower, Manhattan, Euclidean) and 3 mixed distance metrics (DAISY, Supersom, and Mercator) with 3 clustering algorithms (hierarchical (HC), k-medoids, self-organizing maps (SOM)). We quantitatively and visually validate by Adjusted Rand Index (ARI) and silhouette width (SW). We applied our best methods to two real-world data sets: (1) 21 features collected on 247 patients with chronic lymphocytic leukemia, and (2) 40 features collected on 6000 patients admitted to an intensive care unit. RESULTS: HC outperformed k-medoids and SOM by ARI across data types. DAISY produced the highest mean ARI for mixed data types for all mixtures except unbalanced mixtures dominated by continuous data. Compared to other methods, DAISY with HC uncovered superior, separable clusters in both real-world data sets. DISCUSSION: Selecting an appropriate mixed-type metric allows the investigator to obtain optimal separation of patient clusters and get maximum use of their data. Superior metrics for mixed-type data handle multiple data types using multiple, type-focused distances. Better subclassification of disease opens avenues for targeted treatments, precision medicine, clinical decision support, and improved patient outcomes.


Asunto(s)
Leucemia Linfocítica Crónica de Células B , Algoritmos , Análisis por Conglomerados , Simulación por Computador , Humanos
15.
BMC Med Inform Decis Mak ; 21(1): 97, 2021 03 09.
Artículo en Inglés | MEDLINE | ID: mdl-33750375

RESUMEN

BACKGROUND: In the intensive care unit (ICU), delirium is a common, acute, confusional state associated with high risk for short- and long-term morbidity and mortality. Machine learning (ML) has promise to address research priorities and improve delirium outcomes. However, due to clinical and billing conventions, delirium is often inconsistently or incompletely labeled in electronic health record (EHR) datasets. Here, we identify clinical actions abstracted from clinical guidelines in electronic health records (EHR) data that indicate risk of delirium among intensive care unit (ICU) patients. We develop a novel prediction model to label patients with delirium based on a large data set and assess model performance. METHODS: EHR data on 48,451 admissions from 2001 to 2012, available through Medical Information Mart for Intensive Care-III database (MIMIC-III), was used to identify features to develop our prediction models. Five binary ML classification models (Logistic Regression; Classification and Regression Trees; Random Forests; Naïve Bayes; and Support Vector Machines) were fit and ranked by Area Under the Curve (AUC) scores. We compared our best model with two models previously proposed in the literature for goodness of fit, precision, and through biological validation. RESULTS: Our best performing model with threshold reclassification for predicting delirium was based on a multiple logistic regression using the 31 clinical actions (AUC 0.83). Our model out performed other proposed models by biological validation on clinically meaningful, delirium-associated outcomes. CONCLUSIONS: Hurdles in identifying accurate labels in large-scale datasets limit clinical applications of ML in delirium. We developed a novel labeling model for delirium in the ICU using a large, public data set. By using guideline-directed clinical actions independent from risk factors, treatments, and outcomes as model predictors, our classifier could be used as a delirium label for future clinically targeted models.


Asunto(s)
Delirio , Unidades de Cuidados Intensivos , Teorema de Bayes , Delirio/diagnóstico , Registros Electrónicos de Salud , Humanos , Aprendizaje Automático
16.
BMC Bioinformatics ; 22(1): 100, 2021 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-33648439

RESUMEN

BACKGROUND: There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes by transforming them into a binary model. However, such advances are accompanied by new problems of data sparsity, heterogeneity, and noisiness that are magnified by the large-scale multidimensional nature of the data. To address these problems, we developed the Mercator R package, which processes and visualizes binary biomedical data. We use Mercator to address biomedical questions of cytogenetic patterns relating to lymphoid hematologic malignancies, which include a broad set of leukemias and lymphomas. Karyotype data are one of the most common form of genetic data collected on lymphoid malignancies, because karyotyping is part of the standard of care in these cancers. RESULTS: In this paper we combine the analytic power of CytoGPS and Mercator to perform a large-scale multidimensional pattern recognition study on 22,741 karyotype samples in 47 different hematologic malignancies obtained from the public Mitelman database. CONCLUSION: Our findings indicate that Mercator was able to identify both known and novel cytogenetic patterns across different lymphoid malignancies, furthering our understanding of the genetics of these diseases.


Asunto(s)
Enfermedades Hematológicas , Cariotipificación , Neoplasias , Aberraciones Cromosómicas , Humanos , Cariotipo
17.
Genome Res ; 31(5): 747-761, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33707228

RESUMEN

Acute myeloid leukemia (AML) is a molecularly complex disease characterized by heterogeneous tumor genetic profiles and involving numerous pathogenic mechanisms and pathways. Integration of molecular data types across multiple patient cohorts may advance current genetic approaches for improved subclassification and understanding of the biology of the disease. Here, we analyzed genome-wide DNA methylation in 649 AML patients using Illumina arrays and identified a configuration of 13 subtypes (termed "epitypes") using unbiased clustering. Integration of genetic data revealed that most epitypes were associated with a certain recurrent mutation (or combination) in a majority of patients, yet other epitypes were largely independent. Epitypes showed developmental blockage at discrete stages of myeloid differentiation, revealing epitypes that retain arrested hematopoietic stem-cell-like phenotypes. Detailed analyses of DNA methylation patterns identified unique patterns of aberrant hyper- and hypomethylation among epitypes, with variable involvement of transcription factors influencing promoter, enhancer, and repressed regions. Patients in epitypes with stem-cell-like methylation features showed inferior overall survival along with up-regulated stem cell gene expression signatures. We further identified a DNA methylation signature involving STAT motifs associated with FLT3-ITD mutations. Finally, DNA methylation signatures were stable at relapse for the large majority of patients, and rare epitype switching accompanied loss of the dominant epitype mutations and reversion to stem-cell-like methylation patterns. These results show that DNA methylation-based classification integrates important molecular features of AML to reveal the diverse pathogenic and biological aspects of the disease.


Asunto(s)
Metilación de ADN , Leucemia Mieloide Aguda , Humanos , Leucemia Mieloide Aguda/metabolismo , Mutación , Regiones Promotoras Genéticas
18.
Bioinformatics ; 37(17): 2780-2781, 2021 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-33515233

RESUMEN

SUMMARY: Unsupervised machine learning provides tools for researchers to uncover latent patterns in large-scale data, based on calculated distances between observations. Methods to visualize high-dimensional data based on these distances can elucidate subtypes and interactions within multi-dimensional and high-throughput data. However, researchers can select from a vast number of distance metrics and visualizations, each with their own strengths and weaknesses. The Mercator R package facilitates selection of a biologically meaningful distance from 10 metrics, together appropriate for binary, categorical and continuous data, and visualization with 5 standard and high-dimensional graphics tools. Mercator provides a user-friendly pipeline for informaticians or biologists to perform unsupervised analyses, from exploratory pattern recognition to production of publication-quality graphics. AVAILABILITYAND IMPLEMENTATION: Mercator is freely available at the Comprehensive R Archive Network (https://cran.r-project.org/web/packages/Mercator/index.html).

19.
Sci Rep ; 10(1): 18316, 2020 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-33110146

RESUMEN

The Akt family is comprised of three unique homologous proteins with isoform-specific effects, but isoform-specific in vivo data are limited in follicular thyroid cancer (FTC), a PI3 kinase-driven tumor. Prior studies demonstrated that PI3K/Akt signaling is important in thyroid hormone receptor ßPV/PV knock-in (PV) mice that develop metastatic thyroid cancer that most closely resembles FTC. To determine the roles of Akt isoforms in this model we crossed Akt1-/-, Akt2-/-, and Akt3-/- mice with PV mice. Over 12 months, thyroid size was reduced for the Akt null crosses (p < 0.001). Thyroid cancer development and local invasion were delayed in only the PVPV-Akt1 knock out (KO) mice in association with increased apoptosis with no change in proliferation. Primary-cultured PVPV-Akt1KO thyrocytes uniquely displayed a reduced cell motility. In contrast, loss of any Akt isoform reduced lung metastasis while vascular invasion was reduced with Akt1 or 3 loss. Microarray of thyroid RNA displayed incomplete overlap between the Akt KO models. The most upregulated gene was the dendritic cell (DC) marker CD209a only in PVPV-Akt1KO thyroids. Immunohistochemistry demonstrated an increase in CD209a-expressing cells in the PVPV-Akt1KO thyroids. In summary, Akt isoforms exhibit common and differential functions that regulate local and metastatic progression in this model of thyroid cancer.


Asunto(s)
Proteínas Proto-Oncogénicas c-akt/metabolismo , Neoplasias de la Tiroides/etiología , Animales , Modelos Animales de Enfermedad , Progresión de la Enfermedad , Técnica del Anticuerpo Fluorescente , Regulación Neoplásica de la Expresión Génica , Ratones , Ratones Noqueados , Análisis de Secuencia por Matrices de Oligonucleótidos , Isoformas de Proteínas , Receptores de Hormona Tiroidea/metabolismo , Glándula Tiroides/metabolismo , Neoplasias de la Tiroides/metabolismo , Neoplasias de la Tiroides/patología
20.
Cancer Genet ; 248-249: 34-38, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-33059160

RESUMEN

Karyotyping, the practice of visually examining and recording chromosomal abnormalities, is commonly used to diagnose diseases of genetic origin, including cancers. Karyotypes are recorded as text written in the International System for Human Cytogenetic Nomenclature (ISCN). Downstream analysis of karyotypes is conducted manually, due to the visual nature of analysis and the linguistic structure of the ISCN. The ISCN has not been computer-readable and, as such, prevents the full potential of these genomic data from being realized. In response, we developed CytoGPS, a platform to analyze large volumes of cytogenetic data using a Loss-Gain-Fusion model that converts the human-readable ISCN karyotypes into a machine-readable binary format. As proof of principle, we applied CytoGPS to cytogenetic data from the Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer, a National Cancer Institute hosted database of over 69,000 karyotypes of human cancers. Using the Jaccard coefficient to determine similarity between karyotypes structured as binary vectors, we were able to identify novel patterns from 4,968 Mitelman CML karyotypes, such as the co-occurrence of trisomy 19 and 21. The CytoGPS platform unlocks the potential for large-scale, comparative analysis of cytogenetic data. This methodological platform is freely available at CytoGPS.org.


Asunto(s)
Algoritmos , Aberraciones Cromosómicas , Cromosomas Humanos , Bases de Datos Factuales , Cariotipificación/métodos , Leucemia Mielógena Crónica BCR-ABL Positiva/genética , Leucemia Mielógena Crónica BCR-ABL Positiva/patología , Análisis Citogenético , Humanos , Pronóstico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...