Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 65
Filter
Add more filters

Country/Region as subject
Publication year range
1.
PLoS Comput Biol ; 19(9): e1011511, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37769024

ABSTRACT

Computer programming is a fundamental tool for life scientists, allowing them to carry out essential research tasks. However, despite various educational efforts, learning to write code can be a challenging endeavor for students and researchers in life-sciences disciplines. Recent advances in artificial intelligence have made it possible to translate human-language prompts to functional code, raising questions about whether these technologies can aid (or replace) life scientists' efforts to write code. Using 184 programming exercises from an introductory-bioinformatics course, we evaluated the extent to which one such tool-OpenAI's ChatGPT-could successfully complete programming tasks. ChatGPT solved 139 (75.5%) of the exercises on its first attempt. For the remaining exercises, we provided natural-language feedback to the model, prompting it to try different approaches. Within 7 or fewer attempts, ChatGPT solved 179 (97.3%) of the exercises. These findings have implications for life-sciences education and research. Instructors may need to adapt their pedagogical approaches and assessment techniques to account for these new capabilities that are available to the general public. For some programming tasks, researchers may be able to work in collaboration with machine-learning models to produce functional code.

2.
PLoS Comput Biol ; 18(3): e1009926, 2022 03.
Article in English | MEDLINE | ID: mdl-35275931

ABSTRACT

By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist-and most support diverse hyperparameters-so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies.


Subject(s)
Algorithms , Machine Learning , Genomics , Humans , Models, Statistical
3.
BMC Bioinformatics ; 22(1): 559, 2021 Nov 22.
Article in English | MEDLINE | ID: mdl-34809557

ABSTRACT

BACKGROUND: When analyzing DNA sequence data of an individual, knowing which nucleotide was inherited from each parent can be beneficial when trying to identify certain types of DNA variants. Mendelian inheritance logic can be used to accurately phase (haplotype) the majority (67-83%) of an individual's heterozygous nucleotide positions when genotypes are available for both parents (trio). However, when all members of a trio are heterozygous at a position, Mendelian inheritance logic cannot be used to phase. For such positions, a computational phasing algorithm can be used. Existing phasing algorithms use a haplotype reference panel, sequencing reads, and/or parental genotypes to phase an individual; however, they are limited in that they can only phase certain types of variants, require a specific genotype build, require large amounts of storage capacity, and/or require long run times. We created trioPhaser to address these challenges. RESULTS: trioPhaser uses gVCF files from an individual and their parents as initial input, and then outputs a phased VCF file. Input trio data are first phased using Mendelian inheritance logic. Then, the positions that cannot be phased using inheritance information alone are phased by the SHAPEIT4 phasing algorithm. Using whole-genome sequencing data of 52 trios, we show that trioPhaser, on average, increases the total number of phased positions by 21.0% and 10.5%, respectively, when compared to the number of positions that SHAPEIT4 or Mendelian inheritance logic can phase when either is used alone. In addition, we show that the accuracy of the phased calls output by trioPhaser are similar to linked-read and read-backed phasing. CONCLUSION: trioPhaser is a containerized software tool that uses both Mendelian inheritance logic and SHAPEIT4 to phase trios when gVCF files are available. By implementing both phasing methods, more variant positions are phased compared to what either method is able to phase alone.


Subject(s)
Genome , Polymorphism, Single Nucleotide , Algorithms , Genomics , Haplotypes , High-Throughput Nucleotide Sequencing , Logic , Sequence Analysis, DNA
4.
Mar Drugs ; 19(1)2021 Jan 18.
Article in English | MEDLINE | ID: mdl-33477536

ABSTRACT

Patients diagnosed with basal-like breast cancer suffer from poor prognosis and limited treatment options. There is an urgent need to identify new targets that can benefit patients with basal-like and claudin-low (BL-CL) breast cancers. We screened fractions from our Marine Invertebrate Compound Library (MICL) to identify compounds that specifically target BL-CL breast cancers. We identified a previously unreported trisulfated sterol, i.e., topsentinol L trisulfate (TLT), which exhibited increased efficacy against BL-CL breast cancers relative to luminal/HER2+ breast cancer. Biochemical investigation of the effects of TLT on BL-CL cell lines revealed its ability to inhibit activation of AMP-activated protein kinase (AMPK) and checkpoint kinase 1 (CHK1) and to promote activation of p38. The importance of targeting AMPK and CHK1 in BL-CL cell lines was validated by treating a panel of breast cancer cell lines with known small molecule inhibitors of AMPK (dorsomorphin) and CHK1 (Ly2603618) and recording the increased effectiveness against BL-CL breast cancers as compared with luminal/HER2+ breast cancer. Finally, we generated a drug response gene-expression signature and projected it against a human tumor panel of 12 different cancer types to identify other cancer types sensitive to the compound. The TLT sensitivity gene-expression signature identified breast and bladder cancer as the most sensitive to TLT, while glioblastoma multiforme was the least sensitive.


Subject(s)
Antineoplastic Agents/pharmacology , Breast Neoplasms/drug therapy , Sterols/pharmacology , AMP-Activated Protein Kinases/drug effects , AMP-Activated Protein Kinases/metabolism , Antineoplastic Agents/chemistry , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Cell Line, Tumor , Checkpoint Kinase 1/drug effects , Checkpoint Kinase 1/metabolism , Claudins/metabolism , Female , Gene Expression Regulation, Neoplastic , Humans , Sterols/chemistry , p38 Mitogen-Activated Protein Kinases/drug effects , p38 Mitogen-Activated Protein Kinases/metabolism
5.
Cancer Cell Int ; 20: 375, 2020.
Article in English | MEDLINE | ID: mdl-32782434

ABSTRACT

BACKGROUND: The aim of this study is to determine whether Hypoxanthine Guanine Phosphoribosyltransferase (HPRT) could be used as a biomarker for the diagnosis and treatment of B cell malignancies. With 4.3% of all new cancers diagnosed as Non-Hodgkin lymphoma, finding new biomarkers for the treatment of B cell cancers is an ongoing pursuit. HPRT is a nucleotide salvage pathway enzyme responsible for the synthesis of guanine and inosine throughout the cell cycle. METHODS: Raji cells were used for this analysis due to their high HPRT internal expression. Internal expression was evaluated utilizing western blotting and RNA sequencing. Surface localization was analyzed using flow cytometry, confocal microscopy, and membrane biotinylation. To determine the source of HPRT surface expression, a CRISPR knockdown of HPRT was generated and confirmed using western blotting. To determine clinical significance, patient blood samples were collected and analyzed for HPRT surface localization. RESULTS: We found surface localization of HPRT on both Raji cancer cells and in 77% of the malignant ALL samples analyzed and observed no significant expression in healthy cells. Surface expression was confirmed in Raji cells with confocal microscopy, where a direct overlap between HPRT specific antibodies and a membrane-specific dye was observed. HPRT was also detected in biotinylated membranes of Raji cells. Upon HPRT knockdown in Raji cells, we found a significant reduction in surface expression, which shows that the HPRT found on the surface originates from the cells themselves. Finally, we found that cells that had elevated levels of HPRT had a direct correlation to XRCC2, BRCA1, PIK3CA, MSH2, MSH6, WDYHV1, AK7, and BLMH expression and an inverse correlation to PRKD2, PTGS2, TCF7L2, CDH1, IL6R, MC1R, AMPD1, TLR6, and BAK1 expression. Of the 17 genes with significant correlation, 9 are involved in cellular proliferation and DNA synthesis, regulation, and repair. CONCLUSIONS: As a surface biomarker that is found on malignant cells and not on healthy cells, HPRT could be used as a surface antigen for targeted immunotherapy. In addition, the gene correlations show that HPRT may have an additional role in regulation of cancer proliferation that has not been previously discovered.

6.
Cancer Cell Int ; 19: 19, 2019.
Article in English | MEDLINE | ID: mdl-30679932

ABSTRACT

BACKGROUND: Incidence of endometrial cancer are rising both in the United States and worldwide. As endometrial cancer becomes more prominent, the need to develop and characterize biomarkers for early stage diagnosis and the treatment of endometrial cancer has become an important priority. Several biomarkers currently used to diagnose endometrial cancer are directly related to obesity. Although epigenetic and mutational biomarkers have been identified and have resulted in treatment options for patients with specific aberrations, many tumors do not harbor those specific aberrations. A promising alternative is to determine biomarkers based on differential gene expression, which can be used to estimate prognosis. METHODS: We evaluated 589 patients to determine differential expression between normal and malignant patient samples. We then supplemented these evaluations with immunohistochemistry staining of endometrial tumors and normal tissues. Additionally, we used the Library of Integrated Network-based Cellular Signatures to evaluate the effects of 1826 chemotherapy drugs on 26 cell lines to determine the effects of each drug on HPRT1 and AURKA expression. RESULTS: Expression of HPRT1, Jag2, AURKA, and PGK1 were elevated when compared to normal samples, and HPRT1 and PGK1 showed a stepwise elevation in expression that was significantly related to cancer grade. To determine the prognostic potential of these genes, we evaluated patient outcome and found that levels of both HPRT1 and AURKA were significantly correlated with overall patient survival. When evaluating drugs that had the most significant effect on lowering the expression of HPRT1 and AURKA, we found that Topo I and MEK inhibitors were most effective at reducing HPRT1 expression. Meanwhile, drugs that were effective at reducing AURKA expression were more diverse (MEK, Topo I, MELK, HDAC, etc.). The effects of these drugs on the expression of HPRT1 and AURKA provides insight into their role within cellular maintenance. CONCLUSIONS: Collectively, these data show that JAG2, AURKA, PGK1, and HRPT1 have the potential to be used independently as diagnostic, prognostic, or treatment biomarkers in endometrial cancer. Expression levels of these genes may provide physicians with insight into tumor aggressiveness and chemotherapy drugs that are well suited to individual patients.

8.
Biol Res ; 52(1): 13, 2019 Mar 21.
Article in English | MEDLINE | ID: mdl-30894224

ABSTRACT

BACKGROUND: Ovarian cancer is a significant cancer-related cause of death in women worldwide. The most used chemotherapeutic regimen is based on carboplatin (CBDCA). However, CBDCA resistance is the main obstacle to a better prognosis. An in vitro drug-resistant cell model would help in the understanding of molecular mechanisms underlying this drug-resistance phenomenon. The aim of this study was to characterize cellular and molecular changes of induced CBDCA-resistant ovarian cancer cell line A2780. METHODS: The cell selection strategy used in this study was a dose-per-pulse method using a concentration of 100 µM for 2 h. Once 20 cycles of exposure to the drug were completed, the cell cultures showed a resistant phenotype. Then, the ovarian cancer cell line A2780 was grown with 100 µM of CBDCA (CBDCA-resistant cells) or without CBDCA (parental cells). After, a drug sensitivity assay, morphological analyses, cell death assays and a RNA-seq analysis were performed in CBDCA-resistant A2780 cells. RESULTS: Microscopy on both parental and CBDCA-resistant A2780 cells showed similar characteristics in morphology and F-actin distribution within cells. In cell-death assays, parental A2780 cells showed a significant increase in phosphatidylserine translocation and caspase-3/7 cleavage compared to CBDCA-resistant A2780 cells (P < 0.05 and P < 0.005, respectively). Cell viability in parental A2780 cells was significantly decreased compared to CBDCA-resistant A2780 cells (P < 0.0005). The RNA-seq analysis showed 156 differentially expressed genes (DEGs) associated mainly to molecular functions. CONCLUSION: CBDCA-resistant A2780 ovarian cancer cells is a reliable model of CBDCA resistance that shows several DEGs involved in molecular functions such as transmembrane activity, protein binding to cell surface receptor and catalytic activity. Also, we found that the Wnt/ß-catenin and integrin signaling pathway are the main metabolic pathway dysregulated in CBDCA-resistant A2780 cells.


Subject(s)
Antineoplastic Agents/pharmacology , Carboplatin/pharmacology , Drug Resistance, Neoplasm/genetics , Gene Expression Regulation, Neoplastic/drug effects , Ovarian Neoplasms/genetics , Transcriptome/drug effects , Cell Death/drug effects , Cell Death/genetics , Cell Line, Tumor , Female , Humans , Ovarian Neoplasms/drug therapy , Ovarian Neoplasms/pathology , Phenotype , Sequence Analysis, RNA , Signal Transduction , Transcriptome/genetics
9.
Bioinformatics ; 33(10): 1514-1520, 2017 May 15.
Article in English | MEDLINE | ID: mdl-28093409

ABSTRACT

MOTIVATION: Using mass spectrometry to measure the concentration and turnover of the individual proteins in a proteome, enables the calculation of individual synthesis and degradation rates for each protein. Software to analyze concentration is readily available, but software to analyze turnover is lacking. Data analysis workflows typically don't access the full breadth of information about instrument precision and accuracy that is present in each peptide isotopic envelope measurement. This method utilizes both isotope distribution and changes in neutromer spacing, which benefits the analysis of both concentration and turnover. RESULTS: We have developed a data analysis tool, DeuteRater, to measure protein turnover from metabolic D 2 O labeling. DeuteRater uses theoretical predictions for label-dependent change in isotope abundance and inter-peak (neutromer) spacing within the isotope envelope to calculate protein turnover rate. We have also used these metrics to evaluate the accuracy and precision of peptide measurements and thereby determined the optimal data acquisition parameters of different instruments, as well as the effect of data processing steps. We show that these combined measurements can be used to remove noise and increase confidence in the protein turnover measurement for each protein. AVAILABILITY AND IMPLEMENTATION: Source code and ReadMe for Python 2 and 3 versions of DeuteRater are available at https://github.com/JC-Price/DeuteRater . Data is at https://chorusproject.org/pages/index.html project number 1147. Critical Intermediate calculation files provided as Tables S3 and S4. Software has only been tested on Windows machines. CONTACT: jcprice@chem.byu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Regulation , Mass Spectrometry/methods , Peptides/analysis , Proteome/genetics , Proteomics/methods , Software , Animals , Isotopes , Kinetics , Mice , Peptides/genetics , Peptides/metabolism , Proteome/metabolism
10.
Cancer Cell Int ; 18: 135, 2018.
Article in English | MEDLINE | ID: mdl-30214377

ABSTRACT

BACKGROUND: Lung, breast, and colorectal malignancies are the leading cause of cancer-related deaths in the world causing over 2.8 million cancer-related deaths yearly. Despite efforts to improve prevention methods, early detection, and treatments, survival rates for advanced stage lung, breast, and colon cancer remain low, indicating a critical need to identify cancer-specific biomarkers for early detection and treatment. Thymidine kinase 1 (TK1) is a nucleotide salvage pathway enzyme involved in cellular proliferation and considered an important tumor proliferation biomarker in the serum. In this study, we further characterized TK1's potential as a tumor biomarker and immunotherapeutic target and clinical relevance. METHODS: We assessed TK1 surface localization by flow cytometry and confocal microscopy in lung (NCI-H460, A549), breast (MDA-MB-231, MCF7), and colorectal (HT-29, SW620) cancer cell lines. We also isolated cell surface proteins from HT-29 cells and performed a western blot confirming the presence of TK1 on cell membrane protein fractions. To evaluate TK1's clinical relevance, we compared TK1 expression levels in normal and malignant tissue through flow cytometry and immunohistochemistry. We also analyzed RNA-Seq data from The Cancer Genome Atlas (TCGA) to assess differential expression of the TK1 gene in lung, breast, and colorectal cancer patients. RESULTS: We found significant expression of TK1 on the surface of NCI-H460, A549, MDA-MB-231, MCF7, and HT-29 cell lines and a strong association between TK1's localization with the membrane through confocal microscopy and Western blot. We found negligible TK1 surface expression in normal healthy tissue and significantly higher TK1 expression in malignant tissues. Patient data from TCGA revealed that the TK1 gene expression is upregulated in cancer patients compared to normal healthy patients. CONCLUSIONS: Our results show that TK1 localizes on the surface of lung, breast, and colorectal cell lines and is upregulated in malignant tissues and patients compared to healthy tissues and patients. We conclude that TK1 is a potential clinical biomarker for the treatment of lung, breast, and colorectal cancer.

11.
Mol Syst Biol ; 12(3): 860, 2016 Mar 10.
Article in English | MEDLINE | ID: mdl-26969729

ABSTRACT

The signaling events that drive familial breast cancer (FBC) risk remain poorly understood. While the majority of genomic studies have focused on genetic risk variants, known risk variants account for at most 30% of FBC cases. Considering that multiple genes may influence FBC risk, we hypothesized that a pathway-based strategy examining different data types from multiple tissues could elucidate the biological basis for FBC. In this study, we performed integrated analyses of gene expression and exome-sequencing data from peripheral blood mononuclear cells and showed that cell adhesion pathways are significantly and consistently dysregulated in women who develop FBC. The dysregulation of cell adhesion pathways in high-risk women was also identified by pathway-based profiling applied to normal breast tissue data from two independent cohorts. The results of our genomic analyses were validated in normal primary mammary epithelial cells from high-risk and control women, using cell-based functional assays, drug-response assays, fluorescence microscopy, and Western blotting assays. Both genomic and cell-based experiments indicate that cell-cell and cell-extracellular matrix adhesion processes seem to be disrupted in non-malignant cells of women at high risk for FBC and suggest a potential role for these processes in FBC development.


Subject(s)
Breast Neoplasms/metabolism , Genetic Predisposition to Disease , Signal Transduction , Aged , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Cell Adhesion , Cohort Studies , Female , Gene Expression Profiling , Genetic Variation , Humans , Leukocytes, Mononuclear/metabolism , Middle Aged
12.
J Biol Chem ; 290(20): 12487-96, 2015 May 15.
Article in English | MEDLINE | ID: mdl-25770209

ABSTRACT

The phospho-binding protein 14-3-3ζ acts as a signaling hub controlling a network of interacting partners and oncogenic pathways. We show here that lysines within the 14-3-3ζ binding pocket and protein-protein interface can be modified by acetylation. The positive charge on two of these lysines, Lys(49) and Lys(120), is critical for coordinating 14-3-3ζ-phosphoprotein interactions. Through screening, we identified HDAC6 as the Lys(49)/Lys(120) deacetylase. Inhibition of HDAC6 blocks 14-3-3ζ interactions with two well described interacting partners, Bad and AS160, which triggers their dephosphorylation at Ser(112) and Thr(642), respectively. Expression of an acetylation-refractory K49R/K120R mutant of 14-3-3ζ rescues both the HDAC6 inhibitor-induced loss of interaction and Ser(112)/Thr(642) phosphorylation. Furthermore, expression of the K49R/K120R mutant of 14-3-3ζ inhibits the cytotoxicity of HDAC6 inhibition. These data demonstrate a novel role for HDAC6 in controlling 14-3-3ζ binding activity.


Subject(s)
14-3-3 Proteins/metabolism , Histone Deacetylases/metabolism , 14-3-3 Proteins/genetics , Acetylation , Amino Acid Substitution , Binding Sites , Cell Survival/genetics , GTPase-Activating Proteins/genetics , GTPase-Activating Proteins/metabolism , HEK293 Cells , Histone Deacetylase 6 , Histone Deacetylases/genetics , Humans , Lysine/genetics , Lysine/metabolism , Mutation, Missense , bcl-Associated Death Protein/genetics , bcl-Associated Death Protein/metabolism
13.
Bioinformatics ; 31(22): 3666-72, 2015 Nov 15.
Article in English | MEDLINE | ID: mdl-26209429

ABSTRACT

MOTIVATION: The Cancer Genome Atlas (TCGA) RNA-Sequencing data are used widely for research. TCGA provides 'Level 3' data, which have been processed using a pipeline specific to that resource. However, we have found using experimentally derived data that this pipeline produces gene-expression values that vary considerably across biological replicates. In addition, some RNA-Sequencing analysis tools require integer-based read counts, which are not provided with the Level 3 data. As an alternative, we have reprocessed the data for 9264 tumor and 741 normal samples across 24 cancer types using the Rsubread package. We have also collated corresponding clinical data for these samples. We provide these data as a community resource. RESULTS: We compared TCGA samples processed using either pipeline and found that the Rsubread pipeline produced fewer zero-expression genes and more consistent expression levels across replicate samples than the TCGA pipeline. Additionally, we used a genomic-signature approach to estimate HER2 (ERBB2) activation status for 662 breast-tumor samples and found that the Rsubread data resulted in stronger predictions of HER2 pathway activity. Finally, we used data from both pipelines to classify 575 lung cancer samples based on histological type. This analysis identified various non-coding RNA that may influence lung-cancer histology. AVAILABILITY AND IMPLEMENTATION: The RNA-Sequencing and clinical data can be downloaded from Gene Expression Omnibus (accession number GSE62944). Scripts and code that were used to process and analyze the data are available from https://github.com/srp33/TCGA_RNASeq_Clinical. CONTACT: stephen_piccolo@byu.edu or andreab@genetics.utah.edu SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.


Subject(s)
Breast Neoplasms/genetics , Genome, Human , Sequence Analysis, RNA/methods , Statistics as Topic , Breast Neoplasms/classification , Female , Gene Expression Regulation, Neoplastic , Humans , ROC Curve , Reproducibility of Results
14.
Bioinformatics ; 31(11): 1745-53, 2015 Jun 01.
Article in English | MEDLINE | ID: mdl-25617415

ABSTRACT

MOTIVATION: Although gene-expression signature-based biomarkers are often developed for clinical diagnosis, many promising signatures fail to replicate during validation. One major challenge is that biological samples used to generate and validate the signature are often from heterogeneous biological contexts-controlled or in vitro samples may be used to generate the signature, but patient samples may be used for validation. In addition, systematic technical biases from multiple genome-profiling platforms often mask true biological variation. Addressing such challenges will enable us to better elucidate disease mechanisms and provide improved guidance for personalized therapeutics. RESULTS: Here, we present a pathway profiling toolkit, Adaptive Signature Selection and InteGratioN (ASSIGN), which enables robust and context-specific pathway analyses by efficiently capturing pathway activity in heterogeneous sets of samples and across profiling technologies. The ASSIGN framework is based on a flexible Bayesian factor analysis approach that allows for simultaneous profiling of multiple correlated pathways and for the adaptation of pathway signatures into specific disease. We demonstrate the robustness and versatility of ASSIGN in estimating pathway activity in simulated data, cell lines perturbed pathways and in primary tissues samples including The Cancer Genome Atlas breast carcinoma samples and liver samples exposed to genotoxic carcinogens. AVAILABILITY AND IMPLEMENTATION: Software for our approach is available for download at: http://www.bioconductor.org/packages/release/bioc/html/ASSIGN.html and https://github.com/wevanjohnson/ASSIGN.


Subject(s)
Gene Expression Profiling/methods , Software , Animals , Bayes Theorem , Breast Neoplasms/genetics , Female , Genomics/methods , Humans , Rats , Signal Transduction/genetics
15.
Proc Natl Acad Sci U S A ; 110(44): 17778-83, 2013 Oct 29.
Article in English | MEDLINE | ID: mdl-24128763

ABSTRACT

Over the past two decades, many biotechnology platforms have been developed for high-throughput gene expression profiling. However, because each platform is subject to technology-specific biases and produces distinct raw-data distributions, researchers have experienced difficulty in integrating data across platforms. Data integration is crucial to data-generating consortiums, researchers transitioning to newer profiling technologies, and individuals seeking to aggregate data across experiments. We address this need with our Universal exPression Code (UPC) approach, which corrects for platform-specific background noise using models that account for the genomic base composition and length of target regions; this approach also uses a mixture model to estimate whether a gene is active in a particular profiling sample. The latter produces standardized UPC values on a zero-to-one scale, so that they can be interpreted consistently, irrespective of profiling technology, thus enabling downstream analysis pipelines to be developed in a platform-agnostic manner. The UPC method can be applied to one- and two-channel expression microarrays and to next-generation sequencing data (RNA sequencing). Furthermore, UPCs are derived using information from within a given sample only--no ancillary samples are required at processing time. Thus, UPCs are suitable for personalized-medicine workflows where samples must be processed individually rather than in batches. In a variety of analyses and comparisons, UPCs perform comparably to other methods designed specifically for microarrays or RNA sequencing in most settings. Software for calculating UPCs is freely available at www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html.


Subject(s)
Algorithms , DNA Barcoding, Taxonomic/methods , Gene Expression Profiling/methods , Genes/genetics , Models, Genetic , Software , Transcriptional Activation/physiology , Base Composition
16.
Mol Cancer Res ; 22(2): 137-151, 2024 Feb 01.
Article in English | MEDLINE | ID: mdl-37847650

ABSTRACT

Beyond the most common oncogenes activated by mutation (mut-drivers), there likely exists a variety of low-frequency mut-drivers, each of which is a possible frontier for targeted therapy. To identify new and understudied mut-drivers, we developed a machine learning (ML) model that integrates curated clinical cancer data and posttranslational modification (PTM) proteomics databases. We applied the approach to 62,746 patient cancers spanning 84 cancer types and predicted 3,964 oncogenic mutations across 1,148 genes, many of which disrupt PTMs of known and unknown function. The list of putative mut-drivers includes established drivers and others with poorly understood roles in cancer. This ML model is available as a web application. As a case study, we focused the approach on nonreceptor tyrosine kinases (NRTK) and found a recurrent mutation in activated CDC42 kinase-1 (ACK1) that disrupts the Mig6 homology region (MHR) and ubiquitin-association (UBA) domains on the ACK1 C-terminus. By studying these domains in cultured cells, we found that disruption of the MHR domain helps activate the kinase while disruption of the UBA increases kinase stability by blocking its lysosomal degradation. This ACK1 mutation is analogous to lymphoma-associated mutations in its sister kinase, TNK1, which also disrupt a C-terminal inhibitory motif and UBA domain. This study establishes a mut-driver discovery tool for the research community and identifies a mechanism of ACK1 hyperactivation shared among ACK family kinases. IMPLICATIONS: This research identifies a potentially targetable activating mutation in ACK1 and other possible oncogenic mutations, including PTM-disrupting mutations, for further study.


Subject(s)
Neoplasms , Proteomics , Humans , Protein Processing, Post-Translational , Neoplasms/genetics , Ubiquitin/metabolism , Cells, Cultured , Fetal Proteins/metabolism , Protein-Tyrosine Kinases/metabolism
17.
Genomics ; 100(6): 337-44, 2012 Dec.
Article in English | MEDLINE | ID: mdl-22959562

ABSTRACT

Gene-expression microarrays allow researchers to characterize biological phenomena in a high-throughput fashion but are subject to technological biases and inevitable variabilities that arise during sample collection and processing. Normalization techniques aim to correct such biases. Most existing methods require multiple samples to be processed in aggregate; consequently, each sample's output is influenced by other samples processed jointly. However, in personalized-medicine workflows, samples may arrive serially, so renormalizing all samples upon each new arrival would be impractical. We have developed Single Channel Array Normalization (SCAN), a single-sample technique that models the effects of probe-nucleotide composition on fluorescence intensity and corrects for such effects, dramatically increasing the signal-to-noise ratio within individual samples while decreasing variation across samples. In various benchmark comparisons, we show that SCAN performs as well as or better than competing methods yet has no dependence on external reference samples and can be applied to any single-channel microarray platform.


Subject(s)
Gene Expression Profiling/methods , Oligonucleotide Array Sequence Analysis/methods , Precision Medicine/methods , Analysis of Variance , Fluorescence , High-Throughput Screening Assays/methods , Humans , Sample Size , Selection Bias , Signal-To-Noise Ratio , Workflow
18.
J Integr Bioinform ; 2023 Dec 05.
Article in English | MEDLINE | ID: mdl-38047898

ABSTRACT

TidyGEO is a Web-based tool for downloading, tidying, and reformatting data series from Gene Expression Omnibus (GEO). As a freely accessible repository with data from over 6 million biological samples across more than 4000 organisms, GEO provides diverse opportunities for secondary research. Although scientists may find assay data relevant to a given research question, most analyses require sample-level annotations. In GEO, such annotations are stored alongside assay data in delimited, text-based files. However, the structure and semantics of the annotations vary widely from one series to another, and many annotations are not useful for analysis purposes. Thus, every GEO series must be tidied before it is analyzed. Manual approaches may be used, but these are error prone and take time away from other research tasks. Custom computer scripts can be written, but many scientists lack the computational expertise to create such scripts. To address these challenges, we created TidyGEO, which supports essential data-cleaning tasks for sample-level annotations, such as selecting informative columns, renaming columns, splitting or merging columns, standardizing data values, and filtering samples. Additionally, users can integrate annotations with assay data, restructure assay data, and generate code that enables others to reproduce these steps.

19.
Biochem Biophys Res Commun ; 422(3): 436-41, 2012 Jun 08.
Article in English | MEDLINE | ID: mdl-22580005

ABSTRACT

Menthol, a naturally occurring monoterpene, is used in foods, cosmetic products, and topical therapeutic preparations. It also exerts cytotoxic activity against several cancer cell types, including prostate cancer cells. However, little is known about the mechanism of menthol action against prostate cancer cells. In this study, we investigated the effect of menthol on the gene expression profile of PC-3 prostate cancer cells using DNA microarray analyses. Gene set enrichment analysis revealed that menthol primarily affects the expression of cell cycle-related genes. Subsequent experimental analyses validated that menthol induces G2/M arrest. Particularly, menthol markedly down-regulated polo-like kinase 1 (PLK1), a key regulator of G2/M phase progression and inhibited its downstream signaling. Our computational analyses and experimental validation provide a basis for future investigations that are aimed at elucidating the action of menthol on cell cycle control in prostate cancer cells.


Subject(s)
Antineoplastic Agents/pharmacology , Cell Cycle Checkpoints/drug effects , Cell Division/drug effects , G2 Phase/drug effects , Gene Expression/drug effects , Menthol/pharmacology , Prostatic Neoplasms/genetics , Cell Cycle Checkpoints/genetics , Cell Cycle Proteins/antagonists & inhibitors , Cell Cycle Proteins/genetics , Cell Division/genetics , Cell Line, Tumor , Down-Regulation , G2 Phase/genetics , Gene Expression Profiling , Humans , Male , Protein Serine-Threonine Kinases/antagonists & inhibitors , Protein Serine-Threonine Kinases/genetics , Proto-Oncogene Proteins/antagonists & inhibitors , Proto-Oncogene Proteins/genetics , Polo-Like Kinase 1
SELECTION OF CITATIONS
SEARCH DETAIL