RESUMEN
Computer programming is a fundamental tool for life scientists, allowing them to carry out essential research tasks. However, despite various educational efforts, learning to write code can be a challenging endeavor for students and researchers in life-sciences disciplines. Recent advances in artificial intelligence have made it possible to translate human-language prompts to functional code, raising questions about whether these technologies can aid (or replace) life scientists' efforts to write code. Using 184 programming exercises from an introductory-bioinformatics course, we evaluated the extent to which one such tool-OpenAI's ChatGPT-could successfully complete programming tasks. ChatGPT solved 139 (75.5%) of the exercises on its first attempt. For the remaining exercises, we provided natural-language feedback to the model, prompting it to try different approaches. Within 7 or fewer attempts, ChatGPT solved 179 (97.3%) of the exercises. These findings have implications for life-sciences education and research. Instructors may need to adapt their pedagogical approaches and assessment techniques to account for these new capabilities that are available to the general public. For some programming tasks, researchers may be able to work in collaboration with machine-learning models to produce functional code.
RESUMEN
By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist-and most support diverse hyperparameters-so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies.
Asunto(s)
Algoritmos , Aprendizaje Automático , Genómica , Humanos , Modelos EstadísticosRESUMEN
BACKGROUND: When analyzing DNA sequence data of an individual, knowing which nucleotide was inherited from each parent can be beneficial when trying to identify certain types of DNA variants. Mendelian inheritance logic can be used to accurately phase (haplotype) the majority (67-83%) of an individual's heterozygous nucleotide positions when genotypes are available for both parents (trio). However, when all members of a trio are heterozygous at a position, Mendelian inheritance logic cannot be used to phase. For such positions, a computational phasing algorithm can be used. Existing phasing algorithms use a haplotype reference panel, sequencing reads, and/or parental genotypes to phase an individual; however, they are limited in that they can only phase certain types of variants, require a specific genotype build, require large amounts of storage capacity, and/or require long run times. We created trioPhaser to address these challenges. RESULTS: trioPhaser uses gVCF files from an individual and their parents as initial input, and then outputs a phased VCF file. Input trio data are first phased using Mendelian inheritance logic. Then, the positions that cannot be phased using inheritance information alone are phased by the SHAPEIT4 phasing algorithm. Using whole-genome sequencing data of 52 trios, we show that trioPhaser, on average, increases the total number of phased positions by 21.0% and 10.5%, respectively, when compared to the number of positions that SHAPEIT4 or Mendelian inheritance logic can phase when either is used alone. In addition, we show that the accuracy of the phased calls output by trioPhaser are similar to linked-read and read-backed phasing. CONCLUSION: trioPhaser is a containerized software tool that uses both Mendelian inheritance logic and SHAPEIT4 to phase trios when gVCF files are available. By implementing both phasing methods, more variant positions are phased compared to what either method is able to phase alone.
Asunto(s)
Genoma , Polimorfismo de Nucleótido Simple , Algoritmos , Genómica , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Lógica , Análisis de Secuencia de ADNRESUMEN
Patients diagnosed with basal-like breast cancer suffer from poor prognosis and limited treatment options. There is an urgent need to identify new targets that can benefit patients with basal-like and claudin-low (BL-CL) breast cancers. We screened fractions from our Marine Invertebrate Compound Library (MICL) to identify compounds that specifically target BL-CL breast cancers. We identified a previously unreported trisulfated sterol, i.e., topsentinol L trisulfate (TLT), which exhibited increased efficacy against BL-CL breast cancers relative to luminal/HER2+ breast cancer. Biochemical investigation of the effects of TLT on BL-CL cell lines revealed its ability to inhibit activation of AMP-activated protein kinase (AMPK) and checkpoint kinase 1 (CHK1) and to promote activation of p38. The importance of targeting AMPK and CHK1 in BL-CL cell lines was validated by treating a panel of breast cancer cell lines with known small molecule inhibitors of AMPK (dorsomorphin) and CHK1 (Ly2603618) and recording the increased effectiveness against BL-CL breast cancers as compared with luminal/HER2+ breast cancer. Finally, we generated a drug response gene-expression signature and projected it against a human tumor panel of 12 different cancer types to identify other cancer types sensitive to the compound. The TLT sensitivity gene-expression signature identified breast and bladder cancer as the most sensitive to TLT, while glioblastoma multiforme was the least sensitive.
Asunto(s)
Antineoplásicos/farmacología , Neoplasias de la Mama/tratamiento farmacológico , Esteroles/farmacología , Proteínas Quinasas Activadas por AMP/efectos de los fármacos , Proteínas Quinasas Activadas por AMP/metabolismo , Antineoplásicos/química , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Línea Celular Tumoral , Quinasa 1 Reguladora del Ciclo Celular (Checkpoint 1)/efectos de los fármacos , Quinasa 1 Reguladora del Ciclo Celular (Checkpoint 1)/metabolismo , Claudinas/metabolismo , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Esteroles/química , Proteínas Quinasas p38 Activadas por Mitógenos/efectos de los fármacos , Proteínas Quinasas p38 Activadas por Mitógenos/metabolismoRESUMEN
BACKGROUND: The aim of this study is to determine whether Hypoxanthine Guanine Phosphoribosyltransferase (HPRT) could be used as a biomarker for the diagnosis and treatment of B cell malignancies. With 4.3% of all new cancers diagnosed as Non-Hodgkin lymphoma, finding new biomarkers for the treatment of B cell cancers is an ongoing pursuit. HPRT is a nucleotide salvage pathway enzyme responsible for the synthesis of guanine and inosine throughout the cell cycle. METHODS: Raji cells were used for this analysis due to their high HPRT internal expression. Internal expression was evaluated utilizing western blotting and RNA sequencing. Surface localization was analyzed using flow cytometry, confocal microscopy, and membrane biotinylation. To determine the source of HPRT surface expression, a CRISPR knockdown of HPRT was generated and confirmed using western blotting. To determine clinical significance, patient blood samples were collected and analyzed for HPRT surface localization. RESULTS: We found surface localization of HPRT on both Raji cancer cells and in 77% of the malignant ALL samples analyzed and observed no significant expression in healthy cells. Surface expression was confirmed in Raji cells with confocal microscopy, where a direct overlap between HPRT specific antibodies and a membrane-specific dye was observed. HPRT was also detected in biotinylated membranes of Raji cells. Upon HPRT knockdown in Raji cells, we found a significant reduction in surface expression, which shows that the HPRT found on the surface originates from the cells themselves. Finally, we found that cells that had elevated levels of HPRT had a direct correlation to XRCC2, BRCA1, PIK3CA, MSH2, MSH6, WDYHV1, AK7, and BLMH expression and an inverse correlation to PRKD2, PTGS2, TCF7L2, CDH1, IL6R, MC1R, AMPD1, TLR6, and BAK1 expression. Of the 17 genes with significant correlation, 9 are involved in cellular proliferation and DNA synthesis, regulation, and repair. CONCLUSIONS: As a surface biomarker that is found on malignant cells and not on healthy cells, HPRT could be used as a surface antigen for targeted immunotherapy. In addition, the gene correlations show that HPRT may have an additional role in regulation of cancer proliferation that has not been previously discovered.
RESUMEN
BACKGROUND: Incidence of endometrial cancer are rising both in the United States and worldwide. As endometrial cancer becomes more prominent, the need to develop and characterize biomarkers for early stage diagnosis and the treatment of endometrial cancer has become an important priority. Several biomarkers currently used to diagnose endometrial cancer are directly related to obesity. Although epigenetic and mutational biomarkers have been identified and have resulted in treatment options for patients with specific aberrations, many tumors do not harbor those specific aberrations. A promising alternative is to determine biomarkers based on differential gene expression, which can be used to estimate prognosis. METHODS: We evaluated 589 patients to determine differential expression between normal and malignant patient samples. We then supplemented these evaluations with immunohistochemistry staining of endometrial tumors and normal tissues. Additionally, we used the Library of Integrated Network-based Cellular Signatures to evaluate the effects of 1826 chemotherapy drugs on 26 cell lines to determine the effects of each drug on HPRT1 and AURKA expression. RESULTS: Expression of HPRT1, Jag2, AURKA, and PGK1 were elevated when compared to normal samples, and HPRT1 and PGK1 showed a stepwise elevation in expression that was significantly related to cancer grade. To determine the prognostic potential of these genes, we evaluated patient outcome and found that levels of both HPRT1 and AURKA were significantly correlated with overall patient survival. When evaluating drugs that had the most significant effect on lowering the expression of HPRT1 and AURKA, we found that Topo I and MEK inhibitors were most effective at reducing HPRT1 expression. Meanwhile, drugs that were effective at reducing AURKA expression were more diverse (MEK, Topo I, MELK, HDAC, etc.). The effects of these drugs on the expression of HPRT1 and AURKA provides insight into their role within cellular maintenance. CONCLUSIONS: Collectively, these data show that JAG2, AURKA, PGK1, and HRPT1 have the potential to be used independently as diagnostic, prognostic, or treatment biomarkers in endometrial cancer. Expression levels of these genes may provide physicians with insight into tumor aggressiveness and chemotherapy drugs that are well suited to individual patients.
RESUMEN
[This corrects the article DOI: 10.1186/s12935-018-0633-9.].
RESUMEN
BACKGROUND: Ovarian cancer is a significant cancer-related cause of death in women worldwide. The most used chemotherapeutic regimen is based on carboplatin (CBDCA). However, CBDCA resistance is the main obstacle to a better prognosis. An in vitro drug-resistant cell model would help in the understanding of molecular mechanisms underlying this drug-resistance phenomenon. The aim of this study was to characterize cellular and molecular changes of induced CBDCA-resistant ovarian cancer cell line A2780. METHODS: The cell selection strategy used in this study was a dose-per-pulse method using a concentration of 100 µM for 2 h. Once 20 cycles of exposure to the drug were completed, the cell cultures showed a resistant phenotype. Then, the ovarian cancer cell line A2780 was grown with 100 µM of CBDCA (CBDCA-resistant cells) or without CBDCA (parental cells). After, a drug sensitivity assay, morphological analyses, cell death assays and a RNA-seq analysis were performed in CBDCA-resistant A2780 cells. RESULTS: Microscopy on both parental and CBDCA-resistant A2780 cells showed similar characteristics in morphology and F-actin distribution within cells. In cell-death assays, parental A2780 cells showed a significant increase in phosphatidylserine translocation and caspase-3/7 cleavage compared to CBDCA-resistant A2780 cells (P < 0.05 and P < 0.005, respectively). Cell viability in parental A2780 cells was significantly decreased compared to CBDCA-resistant A2780 cells (P < 0.0005). The RNA-seq analysis showed 156 differentially expressed genes (DEGs) associated mainly to molecular functions. CONCLUSION: CBDCA-resistant A2780 ovarian cancer cells is a reliable model of CBDCA resistance that shows several DEGs involved in molecular functions such as transmembrane activity, protein binding to cell surface receptor and catalytic activity. Also, we found that the Wnt/ß-catenin and integrin signaling pathway are the main metabolic pathway dysregulated in CBDCA-resistant A2780 cells.
Asunto(s)
Antineoplásicos/farmacología , Carboplatino/farmacología , Resistencia a Antineoplásicos/genética , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Neoplasias Ováricas/genética , Transcriptoma/efectos de los fármacos , Muerte Celular/efectos de los fármacos , Muerte Celular/genética , Línea Celular Tumoral , Femenino , Humanos , Neoplasias Ováricas/tratamiento farmacológico , Neoplasias Ováricas/patología , Fenotipo , Análisis de Secuencia de ARN , Transducción de Señal , Transcriptoma/genéticaRESUMEN
MOTIVATION: Using mass spectrometry to measure the concentration and turnover of the individual proteins in a proteome, enables the calculation of individual synthesis and degradation rates for each protein. Software to analyze concentration is readily available, but software to analyze turnover is lacking. Data analysis workflows typically don't access the full breadth of information about instrument precision and accuracy that is present in each peptide isotopic envelope measurement. This method utilizes both isotope distribution and changes in neutromer spacing, which benefits the analysis of both concentration and turnover. RESULTS: We have developed a data analysis tool, DeuteRater, to measure protein turnover from metabolic D 2 O labeling. DeuteRater uses theoretical predictions for label-dependent change in isotope abundance and inter-peak (neutromer) spacing within the isotope envelope to calculate protein turnover rate. We have also used these metrics to evaluate the accuracy and precision of peptide measurements and thereby determined the optimal data acquisition parameters of different instruments, as well as the effect of data processing steps. We show that these combined measurements can be used to remove noise and increase confidence in the protein turnover measurement for each protein. AVAILABILITY AND IMPLEMENTATION: Source code and ReadMe for Python 2 and 3 versions of DeuteRater are available at https://github.com/JC-Price/DeuteRater . Data is at https://chorusproject.org/pages/index.html project number 1147. Critical Intermediate calculation files provided as Tables S3 and S4. Software has only been tested on Windows machines. CONTACT: jcprice@chem.byu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Regulación de la Expresión Génica , Espectrometría de Masas/métodos , Péptidos/análisis , Proteoma/genética , Proteómica/métodos , Programas Informáticos , Animales , Isótopos , Cinética , Ratones , Péptidos/genética , Péptidos/metabolismo , Proteoma/metabolismoRESUMEN
BACKGROUND: Lung, breast, and colorectal malignancies are the leading cause of cancer-related deaths in the world causing over 2.8 million cancer-related deaths yearly. Despite efforts to improve prevention methods, early detection, and treatments, survival rates for advanced stage lung, breast, and colon cancer remain low, indicating a critical need to identify cancer-specific biomarkers for early detection and treatment. Thymidine kinase 1 (TK1) is a nucleotide salvage pathway enzyme involved in cellular proliferation and considered an important tumor proliferation biomarker in the serum. In this study, we further characterized TK1's potential as a tumor biomarker and immunotherapeutic target and clinical relevance. METHODS: We assessed TK1 surface localization by flow cytometry and confocal microscopy in lung (NCI-H460, A549), breast (MDA-MB-231, MCF7), and colorectal (HT-29, SW620) cancer cell lines. We also isolated cell surface proteins from HT-29 cells and performed a western blot confirming the presence of TK1 on cell membrane protein fractions. To evaluate TK1's clinical relevance, we compared TK1 expression levels in normal and malignant tissue through flow cytometry and immunohistochemistry. We also analyzed RNA-Seq data from The Cancer Genome Atlas (TCGA) to assess differential expression of the TK1 gene in lung, breast, and colorectal cancer patients. RESULTS: We found significant expression of TK1 on the surface of NCI-H460, A549, MDA-MB-231, MCF7, and HT-29 cell lines and a strong association between TK1's localization with the membrane through confocal microscopy and Western blot. We found negligible TK1 surface expression in normal healthy tissue and significantly higher TK1 expression in malignant tissues. Patient data from TCGA revealed that the TK1 gene expression is upregulated in cancer patients compared to normal healthy patients. CONCLUSIONS: Our results show that TK1 localizes on the surface of lung, breast, and colorectal cell lines and is upregulated in malignant tissues and patients compared to healthy tissues and patients. We conclude that TK1 is a potential clinical biomarker for the treatment of lung, breast, and colorectal cancer.
RESUMEN
The signaling events that drive familial breast cancer (FBC) risk remain poorly understood. While the majority of genomic studies have focused on genetic risk variants, known risk variants account for at most 30% of FBC cases. Considering that multiple genes may influence FBC risk, we hypothesized that a pathway-based strategy examining different data types from multiple tissues could elucidate the biological basis for FBC. In this study, we performed integrated analyses of gene expression and exome-sequencing data from peripheral blood mononuclear cells and showed that cell adhesion pathways are significantly and consistently dysregulated in women who develop FBC. The dysregulation of cell adhesion pathways in high-risk women was also identified by pathway-based profiling applied to normal breast tissue data from two independent cohorts. The results of our genomic analyses were validated in normal primary mammary epithelial cells from high-risk and control women, using cell-based functional assays, drug-response assays, fluorescence microscopy, and Western blotting assays. Both genomic and cell-based experiments indicate that cell-cell and cell-extracellular matrix adhesion processes seem to be disrupted in non-malignant cells of women at high risk for FBC and suggest a potential role for these processes in FBC development.
Asunto(s)
Neoplasias de la Mama/metabolismo , Predisposición Genética a la Enfermedad , Transducción de Señal , Anciano , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Adhesión Celular , Estudios de Cohortes , Femenino , Perfilación de la Expresión Génica , Variación Genética , Humanos , Leucocitos Mononucleares/metabolismo , Persona de Mediana EdadRESUMEN
The phospho-binding protein 14-3-3ζ acts as a signaling hub controlling a network of interacting partners and oncogenic pathways. We show here that lysines within the 14-3-3ζ binding pocket and protein-protein interface can be modified by acetylation. The positive charge on two of these lysines, Lys(49) and Lys(120), is critical for coordinating 14-3-3ζ-phosphoprotein interactions. Through screening, we identified HDAC6 as the Lys(49)/Lys(120) deacetylase. Inhibition of HDAC6 blocks 14-3-3ζ interactions with two well described interacting partners, Bad and AS160, which triggers their dephosphorylation at Ser(112) and Thr(642), respectively. Expression of an acetylation-refractory K49R/K120R mutant of 14-3-3ζ rescues both the HDAC6 inhibitor-induced loss of interaction and Ser(112)/Thr(642) phosphorylation. Furthermore, expression of the K49R/K120R mutant of 14-3-3ζ inhibits the cytotoxicity of HDAC6 inhibition. These data demonstrate a novel role for HDAC6 in controlling 14-3-3ζ binding activity.
Asunto(s)
Proteínas 14-3-3/metabolismo , Histona Desacetilasas/metabolismo , Proteínas 14-3-3/genética , Acetilación , Sustitución de Aminoácidos , Sitios de Unión , Supervivencia Celular/genética , Proteínas Activadoras de GTPasa/genética , Proteínas Activadoras de GTPasa/metabolismo , Células HEK293 , Histona Desacetilasa 6 , Histona Desacetilasas/genética , Humanos , Lisina/genética , Lisina/metabolismo , Mutación Missense , Proteína Letal Asociada a bcl/genética , Proteína Letal Asociada a bcl/metabolismoRESUMEN
MOTIVATION: The Cancer Genome Atlas (TCGA) RNA-Sequencing data are used widely for research. TCGA provides 'Level 3' data, which have been processed using a pipeline specific to that resource. However, we have found using experimentally derived data that this pipeline produces gene-expression values that vary considerably across biological replicates. In addition, some RNA-Sequencing analysis tools require integer-based read counts, which are not provided with the Level 3 data. As an alternative, we have reprocessed the data for 9264 tumor and 741 normal samples across 24 cancer types using the Rsubread package. We have also collated corresponding clinical data for these samples. We provide these data as a community resource. RESULTS: We compared TCGA samples processed using either pipeline and found that the Rsubread pipeline produced fewer zero-expression genes and more consistent expression levels across replicate samples than the TCGA pipeline. Additionally, we used a genomic-signature approach to estimate HER2 (ERBB2) activation status for 662 breast-tumor samples and found that the Rsubread data resulted in stronger predictions of HER2 pathway activity. Finally, we used data from both pipelines to classify 575 lung cancer samples based on histological type. This analysis identified various non-coding RNA that may influence lung-cancer histology. AVAILABILITY AND IMPLEMENTATION: The RNA-Sequencing and clinical data can be downloaded from Gene Expression Omnibus (accession number GSE62944). Scripts and code that were used to process and analyze the data are available from https://github.com/srp33/TCGA_RNASeq_Clinical. CONTACT: stephen_piccolo@byu.edu or andreab@genetics.utah.edu SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.
Asunto(s)
Neoplasias de la Mama/genética , Genoma Humano , Análisis de Secuencia de ARN/métodos , Estadística como Asunto , Neoplasias de la Mama/clasificación , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Curva ROC , Reproducibilidad de los ResultadosRESUMEN
MOTIVATION: Although gene-expression signature-based biomarkers are often developed for clinical diagnosis, many promising signatures fail to replicate during validation. One major challenge is that biological samples used to generate and validate the signature are often from heterogeneous biological contexts-controlled or in vitro samples may be used to generate the signature, but patient samples may be used for validation. In addition, systematic technical biases from multiple genome-profiling platforms often mask true biological variation. Addressing such challenges will enable us to better elucidate disease mechanisms and provide improved guidance for personalized therapeutics. RESULTS: Here, we present a pathway profiling toolkit, Adaptive Signature Selection and InteGratioN (ASSIGN), which enables robust and context-specific pathway analyses by efficiently capturing pathway activity in heterogeneous sets of samples and across profiling technologies. The ASSIGN framework is based on a flexible Bayesian factor analysis approach that allows for simultaneous profiling of multiple correlated pathways and for the adaptation of pathway signatures into specific disease. We demonstrate the robustness and versatility of ASSIGN in estimating pathway activity in simulated data, cell lines perturbed pathways and in primary tissues samples including The Cancer Genome Atlas breast carcinoma samples and liver samples exposed to genotoxic carcinogens. AVAILABILITY AND IMPLEMENTATION: Software for our approach is available for download at: http://www.bioconductor.org/packages/release/bioc/html/ASSIGN.html and https://github.com/wevanjohnson/ASSIGN.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Programas Informáticos , Animales , Teorema de Bayes , Neoplasias de la Mama/genética , Femenino , Genómica/métodos , Humanos , Ratas , Transducción de Señal/genéticaRESUMEN
Over the past two decades, many biotechnology platforms have been developed for high-throughput gene expression profiling. However, because each platform is subject to technology-specific biases and produces distinct raw-data distributions, researchers have experienced difficulty in integrating data across platforms. Data integration is crucial to data-generating consortiums, researchers transitioning to newer profiling technologies, and individuals seeking to aggregate data across experiments. We address this need with our Universal exPression Code (UPC) approach, which corrects for platform-specific background noise using models that account for the genomic base composition and length of target regions; this approach also uses a mixture model to estimate whether a gene is active in a particular profiling sample. The latter produces standardized UPC values on a zero-to-one scale, so that they can be interpreted consistently, irrespective of profiling technology, thus enabling downstream analysis pipelines to be developed in a platform-agnostic manner. The UPC method can be applied to one- and two-channel expression microarrays and to next-generation sequencing data (RNA sequencing). Furthermore, UPCs are derived using information from within a given sample only--no ancillary samples are required at processing time. Thus, UPCs are suitable for personalized-medicine workflows where samples must be processed individually rather than in batches. In a variety of analyses and comparisons, UPCs perform comparably to other methods designed specifically for microarrays or RNA sequencing in most settings. Software for calculating UPCs is freely available at www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html.
Asunto(s)
Algoritmos , Código de Barras del ADN Taxonómico/métodos , Perfilación de la Expresión Génica/métodos , Genes/genética , Modelos Genéticos , Programas Informáticos , Activación Transcripcional/fisiología , Composición de BaseRESUMEN
To help maximize the impact of scientific journal articles, authors must ensure that article figures are accessible to people with color-vision deficiencies (CVDs), which affect up to 8% of males and 0.5% of females. We evaluated images published in biology- and medicine-oriented research articles between 2012 and 2022. Most included at least one color contrast that could be problematic for people with deuteranopia ('deuteranopes'), the most common form of CVD. However, spatial distances and within-image labels frequently mitigated potential problems. Initially, we reviewed 4964 images from eLife, comparing each against a simulated version that approximated how it might appear to deuteranopes. We identified 636 (12.8%) images that we determined would be difficult for deuteranopes to interpret. Our findings suggest that the frequency of this problem has decreased over time and that articles from cell-oriented disciplines were most often problematic. We used machine learning to automate the identification of problematic images. For a hold-out test set from eLife (n=879), a convolutional neural network classified the images with an area under the precision-recall curve of 0.75. The same network classified images from PubMed Central (n=1191) with an area under the precision-recall curve of 0.39. We created a Web application (https://bioapps.byu.edu/colorblind_image_tester); users can upload images, view simulated versions, and obtain predictions. Our findings shed new light on the frequency and nature of scientific images that may be problematic for deuteranopes and motivate additional efforts to increase accessibility.
Asunto(s)
Defectos de la Visión Cromática , Humanos , Aprendizaje Automático , Femenino , MasculinoRESUMEN
Beyond the most common oncogenes activated by mutation (mut-drivers), there likely exists a variety of low-frequency mut-drivers, each of which is a possible frontier for targeted therapy. To identify new and understudied mut-drivers, we developed a machine learning (ML) model that integrates curated clinical cancer data and posttranslational modification (PTM) proteomics databases. We applied the approach to 62,746 patient cancers spanning 84 cancer types and predicted 3,964 oncogenic mutations across 1,148 genes, many of which disrupt PTMs of known and unknown function. The list of putative mut-drivers includes established drivers and others with poorly understood roles in cancer. This ML model is available as a web application. As a case study, we focused the approach on nonreceptor tyrosine kinases (NRTK) and found a recurrent mutation in activated CDC42 kinase-1 (ACK1) that disrupts the Mig6 homology region (MHR) and ubiquitin-association (UBA) domains on the ACK1 C-terminus. By studying these domains in cultured cells, we found that disruption of the MHR domain helps activate the kinase while disruption of the UBA increases kinase stability by blocking its lysosomal degradation. This ACK1 mutation is analogous to lymphoma-associated mutations in its sister kinase, TNK1, which also disrupt a C-terminal inhibitory motif and UBA domain. This study establishes a mut-driver discovery tool for the research community and identifies a mechanism of ACK1 hyperactivation shared among ACK family kinases. IMPLICATIONS: This research identifies a potentially targetable activating mutation in ACK1 and other possible oncogenic mutations, including PTM-disrupting mutations, for further study.
Asunto(s)
Neoplasias , Proteómica , Humanos , Procesamiento Proteico-Postraduccional , Neoplasias/genética , Ubiquitina/metabolismo , Células Cultivadas , Proteínas Fetales/metabolismo , Proteínas Tirosina Quinasas/metabolismoRESUMEN
BACKGROUND: Seasonal influenza vaccination rates are very low among teenagers. OBJECTIVES: We used publicly available data from the NIS-Teen annual national immunization survey to explore factors that influence the likelihood of a teen receiving their seasonal flu shot. METHODS: Traditional stepwise multivariable regression was used in tandem with machine learning to determine the predictive factors in teen vaccine uptake. RESULTS AND CONCLUSIONS: Age was the largest predictor, with older teens being much less likely to be vaccinated than younger teens (97.48% compared to 41.71%, p < 0.0001). Provider participation in government programs such as Vaccines for Children and the state vaccine registry positively impacts vaccine uptake (p < 0.0001). Identifying as non-Hispanic Black was a small, negative predictor of teen vaccine uptake (78.18% unvaccinated compared to 73.78% of White teens, p < 0.0001). The state quartile for COVID-19 vaccine uptake also strongly predicted flu vaccine uptake, with the upper quartile of state COVID-19 vaccine uptake being significantly more likely to also get vaccinated for influenza (76.96%, 74.94%, 74.55%, and 72.97%, p < 0.0001). Other significant factors are the number of providers, education of the mother, poverty status, and having a mixed provider facility type. Additionally, the multivariable regression analysis revealed little difference in the predictive factors of vaccine uptake between pre- and post-pandemic datasets.
RESUMEN
BACKGROUND/OBJECTIVES: Systemic lupus erythematosus (lupus) and B-cell lymphoma (lymphoma) co-occur at higher-than-expected rates and primarily depend on B cells for their pathology. These observations implicate shared inflammation-related B cell molecular mechanisms as a potential cause of co-occurrence. METHODS: We consequently implemented a novel Immune Imbalance Transcriptomics (IIT) algorithm and applied IIT to lupus, lymphoma, and healthy B cell RNA-sequencing (RNA-seq) data to find shared and contrasting mechanisms that are potential therapeutic targets. RESULTS: We observed 7143 significantly dysregulated genes in both lupus and lymphoma. Of those genes, we found 5137 to have a significant immune imbalance, defined as a significant dysregulation by both diseases, as analyzed by IIT. Gene Ontology (GO) term and pathway enrichment of the IIT genes yielded immune-related "Neutrophil Degranulation" and "Adaptive Immune System", which validates that the IIT algorithm isolates biologically relevant genes in immunity and inflammation. We found that 344 IIT gene products are known targets for established and/or repurposed drugs. Among our results, we found 48 known and 296 novel lupus targets, along with 151 known and 193 novel lymphoma targets. Known disease drug targets in our IIT results further validate that IIT isolates genes with disease-relevant mechanisms. CONCLUSIONS: We anticipate the IIT algorithm, together with the shared and contrasting gene mechanisms uncovered here, will contribute to the development of immune-related therapeutic options for lupus and lymphoma patients.
Asunto(s)
Algoritmos , Lupus Eritematoso Sistémico , Linfoma de Células B , Transcriptoma , Humanos , Lupus Eritematoso Sistémico/genética , Lupus Eritematoso Sistémico/tratamiento farmacológico , Lupus Eritematoso Sistémico/inmunología , Transcriptoma/genética , Linfoma de Células B/genética , Linfoma de Células B/inmunología , Linfoma de Células B/tratamiento farmacológico , Linfocitos B/inmunología , Linfocitos B/metabolismo , Perfilación de la Expresión Génica/métodosRESUMEN
Gene-expression microarrays allow researchers to characterize biological phenomena in a high-throughput fashion but are subject to technological biases and inevitable variabilities that arise during sample collection and processing. Normalization techniques aim to correct such biases. Most existing methods require multiple samples to be processed in aggregate; consequently, each sample's output is influenced by other samples processed jointly. However, in personalized-medicine workflows, samples may arrive serially, so renormalizing all samples upon each new arrival would be impractical. We have developed Single Channel Array Normalization (SCAN), a single-sample technique that models the effects of probe-nucleotide composition on fluorescence intensity and corrects for such effects, dramatically increasing the signal-to-noise ratio within individual samples while decreasing variation across samples. In various benchmark comparisons, we show that SCAN performs as well as or better than competing methods yet has no dependence on external reference samples and can be applied to any single-channel microarray platform.