Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
Más filtros

País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
PLoS Comput Biol ; 19(9): e1011511, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37769024

RESUMEN

Computer programming is a fundamental tool for life scientists, allowing them to carry out essential research tasks. However, despite various educational efforts, learning to write code can be a challenging endeavor for students and researchers in life-sciences disciplines. Recent advances in artificial intelligence have made it possible to translate human-language prompts to functional code, raising questions about whether these technologies can aid (or replace) life scientists' efforts to write code. Using 184 programming exercises from an introductory-bioinformatics course, we evaluated the extent to which one such tool-OpenAI's ChatGPT-could successfully complete programming tasks. ChatGPT solved 139 (75.5%) of the exercises on its first attempt. For the remaining exercises, we provided natural-language feedback to the model, prompting it to try different approaches. Within 7 or fewer attempts, ChatGPT solved 179 (97.3%) of the exercises. These findings have implications for life-sciences education and research. Instructors may need to adapt their pedagogical approaches and assessment techniques to account for these new capabilities that are available to the general public. For some programming tasks, researchers may be able to work in collaboration with machine-learning models to produce functional code.

2.
PLoS Comput Biol ; 18(3): e1009926, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35275931

RESUMEN

By classifying patients into subgroups, clinicians can provide more effective care than using a uniform approach for all patients. Such subgroups might include patients with a particular disease subtype, patients with a good (or poor) prognosis, or patients most (or least) likely to respond to a particular therapy. Transcriptomic measurements reflect the downstream effects of genomic and epigenomic variations. However, high-throughput technologies generate thousands of measurements per patient, and complex dependencies exist among genes, so it may be infeasible to classify patients using traditional statistical models. Machine-learning classification algorithms can help with this problem. However, hundreds of classification algorithms exist-and most support diverse hyperparameters-so it is difficult for researchers to know which are optimal for gene-expression biomarkers. We performed a benchmark comparison, applying 52 classification algorithms to 50 gene-expression datasets (143 class variables). We evaluated algorithms that represent diverse machine-learning methodologies and have been implemented in general-purpose, open-source, machine-learning libraries. When available, we combined clinical predictors with gene-expression data. Additionally, we evaluated the effects of performing hyperparameter optimization and feature selection using nested cross validation. Kernel- and ensemble-based algorithms consistently outperformed other types of classification algorithms; however, even the top-performing algorithms performed poorly in some cases. Hyperparameter optimization and feature selection typically improved predictive performance, and univariate feature-selection algorithms typically outperformed more sophisticated methods. Together, our findings illustrate that algorithm performance varies considerably when other factors are held constant and thus that algorithm selection is a critical step in biomarker studies.


Asunto(s)
Algoritmos , Aprendizaje Automático , Genómica , Humanos , Modelos Estadísticos
3.
BMC Bioinformatics ; 22(1): 559, 2021 Nov 22.
Artículo en Inglés | MEDLINE | ID: mdl-34809557

RESUMEN

BACKGROUND: When analyzing DNA sequence data of an individual, knowing which nucleotide was inherited from each parent can be beneficial when trying to identify certain types of DNA variants. Mendelian inheritance logic can be used to accurately phase (haplotype) the majority (67-83%) of an individual's heterozygous nucleotide positions when genotypes are available for both parents (trio). However, when all members of a trio are heterozygous at a position, Mendelian inheritance logic cannot be used to phase. For such positions, a computational phasing algorithm can be used. Existing phasing algorithms use a haplotype reference panel, sequencing reads, and/or parental genotypes to phase an individual; however, they are limited in that they can only phase certain types of variants, require a specific genotype build, require large amounts of storage capacity, and/or require long run times. We created trioPhaser to address these challenges. RESULTS: trioPhaser uses gVCF files from an individual and their parents as initial input, and then outputs a phased VCF file. Input trio data are first phased using Mendelian inheritance logic. Then, the positions that cannot be phased using inheritance information alone are phased by the SHAPEIT4 phasing algorithm. Using whole-genome sequencing data of 52 trios, we show that trioPhaser, on average, increases the total number of phased positions by 21.0% and 10.5%, respectively, when compared to the number of positions that SHAPEIT4 or Mendelian inheritance logic can phase when either is used alone. In addition, we show that the accuracy of the phased calls output by trioPhaser are similar to linked-read and read-backed phasing. CONCLUSION: trioPhaser is a containerized software tool that uses both Mendelian inheritance logic and SHAPEIT4 to phase trios when gVCF files are available. By implementing both phasing methods, more variant positions are phased compared to what either method is able to phase alone.


Asunto(s)
Genoma , Polimorfismo de Nucleótido Simple , Algoritmos , Genómica , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Lógica , Análisis de Secuencia de ADN
4.
Mar Drugs ; 19(1)2021 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-33477536

RESUMEN

Patients diagnosed with basal-like breast cancer suffer from poor prognosis and limited treatment options. There is an urgent need to identify new targets that can benefit patients with basal-like and claudin-low (BL-CL) breast cancers. We screened fractions from our Marine Invertebrate Compound Library (MICL) to identify compounds that specifically target BL-CL breast cancers. We identified a previously unreported trisulfated sterol, i.e., topsentinol L trisulfate (TLT), which exhibited increased efficacy against BL-CL breast cancers relative to luminal/HER2+ breast cancer. Biochemical investigation of the effects of TLT on BL-CL cell lines revealed its ability to inhibit activation of AMP-activated protein kinase (AMPK) and checkpoint kinase 1 (CHK1) and to promote activation of p38. The importance of targeting AMPK and CHK1 in BL-CL cell lines was validated by treating a panel of breast cancer cell lines with known small molecule inhibitors of AMPK (dorsomorphin) and CHK1 (Ly2603618) and recording the increased effectiveness against BL-CL breast cancers as compared with luminal/HER2+ breast cancer. Finally, we generated a drug response gene-expression signature and projected it against a human tumor panel of 12 different cancer types to identify other cancer types sensitive to the compound. The TLT sensitivity gene-expression signature identified breast and bladder cancer as the most sensitive to TLT, while glioblastoma multiforme was the least sensitive.


Asunto(s)
Antineoplásicos/farmacología , Neoplasias de la Mama/tratamiento farmacológico , Esteroles/farmacología , Proteínas Quinasas Activadas por AMP/efectos de los fármacos , Proteínas Quinasas Activadas por AMP/metabolismo , Antineoplásicos/química , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Línea Celular Tumoral , Quinasa 1 Reguladora del Ciclo Celular (Checkpoint 1)/efectos de los fármacos , Quinasa 1 Reguladora del Ciclo Celular (Checkpoint 1)/metabolismo , Claudinas/metabolismo , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Esteroles/química , Proteínas Quinasas p38 Activadas por Mitógenos/efectos de los fármacos , Proteínas Quinasas p38 Activadas por Mitógenos/metabolismo
5.
Cancer Cell Int ; 20: 375, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32782434

RESUMEN

BACKGROUND: The aim of this study is to determine whether Hypoxanthine Guanine Phosphoribosyltransferase (HPRT) could be used as a biomarker for the diagnosis and treatment of B cell malignancies. With 4.3% of all new cancers diagnosed as Non-Hodgkin lymphoma, finding new biomarkers for the treatment of B cell cancers is an ongoing pursuit. HPRT is a nucleotide salvage pathway enzyme responsible for the synthesis of guanine and inosine throughout the cell cycle. METHODS: Raji cells were used for this analysis due to their high HPRT internal expression. Internal expression was evaluated utilizing western blotting and RNA sequencing. Surface localization was analyzed using flow cytometry, confocal microscopy, and membrane biotinylation. To determine the source of HPRT surface expression, a CRISPR knockdown of HPRT was generated and confirmed using western blotting. To determine clinical significance, patient blood samples were collected and analyzed for HPRT surface localization. RESULTS: We found surface localization of HPRT on both Raji cancer cells and in 77% of the malignant ALL samples analyzed and observed no significant expression in healthy cells. Surface expression was confirmed in Raji cells with confocal microscopy, where a direct overlap between HPRT specific antibodies and a membrane-specific dye was observed. HPRT was also detected in biotinylated membranes of Raji cells. Upon HPRT knockdown in Raji cells, we found a significant reduction in surface expression, which shows that the HPRT found on the surface originates from the cells themselves. Finally, we found that cells that had elevated levels of HPRT had a direct correlation to XRCC2, BRCA1, PIK3CA, MSH2, MSH6, WDYHV1, AK7, and BLMH expression and an inverse correlation to PRKD2, PTGS2, TCF7L2, CDH1, IL6R, MC1R, AMPD1, TLR6, and BAK1 expression. Of the 17 genes with significant correlation, 9 are involved in cellular proliferation and DNA synthesis, regulation, and repair. CONCLUSIONS: As a surface biomarker that is found on malignant cells and not on healthy cells, HPRT could be used as a surface antigen for targeted immunotherapy. In addition, the gene correlations show that HPRT may have an additional role in regulation of cancer proliferation that has not been previously discovered.

6.
Cancer Cell Int ; 19: 19, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30679932

RESUMEN

BACKGROUND: Incidence of endometrial cancer are rising both in the United States and worldwide. As endometrial cancer becomes more prominent, the need to develop and characterize biomarkers for early stage diagnosis and the treatment of endometrial cancer has become an important priority. Several biomarkers currently used to diagnose endometrial cancer are directly related to obesity. Although epigenetic and mutational biomarkers have been identified and have resulted in treatment options for patients with specific aberrations, many tumors do not harbor those specific aberrations. A promising alternative is to determine biomarkers based on differential gene expression, which can be used to estimate prognosis. METHODS: We evaluated 589 patients to determine differential expression between normal and malignant patient samples. We then supplemented these evaluations with immunohistochemistry staining of endometrial tumors and normal tissues. Additionally, we used the Library of Integrated Network-based Cellular Signatures to evaluate the effects of 1826 chemotherapy drugs on 26 cell lines to determine the effects of each drug on HPRT1 and AURKA expression. RESULTS: Expression of HPRT1, Jag2, AURKA, and PGK1 were elevated when compared to normal samples, and HPRT1 and PGK1 showed a stepwise elevation in expression that was significantly related to cancer grade. To determine the prognostic potential of these genes, we evaluated patient outcome and found that levels of both HPRT1 and AURKA were significantly correlated with overall patient survival. When evaluating drugs that had the most significant effect on lowering the expression of HPRT1 and AURKA, we found that Topo I and MEK inhibitors were most effective at reducing HPRT1 expression. Meanwhile, drugs that were effective at reducing AURKA expression were more diverse (MEK, Topo I, MELK, HDAC, etc.). The effects of these drugs on the expression of HPRT1 and AURKA provides insight into their role within cellular maintenance. CONCLUSIONS: Collectively, these data show that JAG2, AURKA, PGK1, and HRPT1 have the potential to be used independently as diagnostic, prognostic, or treatment biomarkers in endometrial cancer. Expression levels of these genes may provide physicians with insight into tumor aggressiveness and chemotherapy drugs that are well suited to individual patients.

8.
Biol Res ; 52(1): 13, 2019 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-30894224

RESUMEN

BACKGROUND: Ovarian cancer is a significant cancer-related cause of death in women worldwide. The most used chemotherapeutic regimen is based on carboplatin (CBDCA). However, CBDCA resistance is the main obstacle to a better prognosis. An in vitro drug-resistant cell model would help in the understanding of molecular mechanisms underlying this drug-resistance phenomenon. The aim of this study was to characterize cellular and molecular changes of induced CBDCA-resistant ovarian cancer cell line A2780. METHODS: The cell selection strategy used in this study was a dose-per-pulse method using a concentration of 100 µM for 2 h. Once 20 cycles of exposure to the drug were completed, the cell cultures showed a resistant phenotype. Then, the ovarian cancer cell line A2780 was grown with 100 µM of CBDCA (CBDCA-resistant cells) or without CBDCA (parental cells). After, a drug sensitivity assay, morphological analyses, cell death assays and a RNA-seq analysis were performed in CBDCA-resistant A2780 cells. RESULTS: Microscopy on both parental and CBDCA-resistant A2780 cells showed similar characteristics in morphology and F-actin distribution within cells. In cell-death assays, parental A2780 cells showed a significant increase in phosphatidylserine translocation and caspase-3/7 cleavage compared to CBDCA-resistant A2780 cells (P < 0.05 and P < 0.005, respectively). Cell viability in parental A2780 cells was significantly decreased compared to CBDCA-resistant A2780 cells (P < 0.0005). The RNA-seq analysis showed 156 differentially expressed genes (DEGs) associated mainly to molecular functions. CONCLUSION: CBDCA-resistant A2780 ovarian cancer cells is a reliable model of CBDCA resistance that shows several DEGs involved in molecular functions such as transmembrane activity, protein binding to cell surface receptor and catalytic activity. Also, we found that the Wnt/ß-catenin and integrin signaling pathway are the main metabolic pathway dysregulated in CBDCA-resistant A2780 cells.


Asunto(s)
Antineoplásicos/farmacología , Carboplatino/farmacología , Resistencia a Antineoplásicos/genética , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Neoplasias Ováricas/genética , Transcriptoma/efectos de los fármacos , Muerte Celular/efectos de los fármacos , Muerte Celular/genética , Línea Celular Tumoral , Femenino , Humanos , Neoplasias Ováricas/tratamiento farmacológico , Neoplasias Ováricas/patología , Fenotipo , Análisis de Secuencia de ARN , Transducción de Señal , Transcriptoma/genética
9.
Bioinformatics ; 33(10): 1514-1520, 2017 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-28093409

RESUMEN

MOTIVATION: Using mass spectrometry to measure the concentration and turnover of the individual proteins in a proteome, enables the calculation of individual synthesis and degradation rates for each protein. Software to analyze concentration is readily available, but software to analyze turnover is lacking. Data analysis workflows typically don't access the full breadth of information about instrument precision and accuracy that is present in each peptide isotopic envelope measurement. This method utilizes both isotope distribution and changes in neutromer spacing, which benefits the analysis of both concentration and turnover. RESULTS: We have developed a data analysis tool, DeuteRater, to measure protein turnover from metabolic D 2 O labeling. DeuteRater uses theoretical predictions for label-dependent change in isotope abundance and inter-peak (neutromer) spacing within the isotope envelope to calculate protein turnover rate. We have also used these metrics to evaluate the accuracy and precision of peptide measurements and thereby determined the optimal data acquisition parameters of different instruments, as well as the effect of data processing steps. We show that these combined measurements can be used to remove noise and increase confidence in the protein turnover measurement for each protein. AVAILABILITY AND IMPLEMENTATION: Source code and ReadMe for Python 2 and 3 versions of DeuteRater are available at https://github.com/JC-Price/DeuteRater . Data is at https://chorusproject.org/pages/index.html project number 1147. Critical Intermediate calculation files provided as Tables S3 and S4. Software has only been tested on Windows machines. CONTACT: jcprice@chem.byu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Regulación de la Expresión Génica , Espectrometría de Masas/métodos , Péptidos/análisis , Proteoma/genética , Proteómica/métodos , Programas Informáticos , Animales , Isótopos , Cinética , Ratones , Péptidos/genética , Péptidos/metabolismo , Proteoma/metabolismo
10.
Cancer Cell Int ; 18: 135, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30214377

RESUMEN

BACKGROUND: Lung, breast, and colorectal malignancies are the leading cause of cancer-related deaths in the world causing over 2.8 million cancer-related deaths yearly. Despite efforts to improve prevention methods, early detection, and treatments, survival rates for advanced stage lung, breast, and colon cancer remain low, indicating a critical need to identify cancer-specific biomarkers for early detection and treatment. Thymidine kinase 1 (TK1) is a nucleotide salvage pathway enzyme involved in cellular proliferation and considered an important tumor proliferation biomarker in the serum. In this study, we further characterized TK1's potential as a tumor biomarker and immunotherapeutic target and clinical relevance. METHODS: We assessed TK1 surface localization by flow cytometry and confocal microscopy in lung (NCI-H460, A549), breast (MDA-MB-231, MCF7), and colorectal (HT-29, SW620) cancer cell lines. We also isolated cell surface proteins from HT-29 cells and performed a western blot confirming the presence of TK1 on cell membrane protein fractions. To evaluate TK1's clinical relevance, we compared TK1 expression levels in normal and malignant tissue through flow cytometry and immunohistochemistry. We also analyzed RNA-Seq data from The Cancer Genome Atlas (TCGA) to assess differential expression of the TK1 gene in lung, breast, and colorectal cancer patients. RESULTS: We found significant expression of TK1 on the surface of NCI-H460, A549, MDA-MB-231, MCF7, and HT-29 cell lines and a strong association between TK1's localization with the membrane through confocal microscopy and Western blot. We found negligible TK1 surface expression in normal healthy tissue and significantly higher TK1 expression in malignant tissues. Patient data from TCGA revealed that the TK1 gene expression is upregulated in cancer patients compared to normal healthy patients. CONCLUSIONS: Our results show that TK1 localizes on the surface of lung, breast, and colorectal cell lines and is upregulated in malignant tissues and patients compared to healthy tissues and patients. We conclude that TK1 is a potential clinical biomarker for the treatment of lung, breast, and colorectal cancer.

11.
Mol Syst Biol ; 12(3): 860, 2016 Mar 10.
Artículo en Inglés | MEDLINE | ID: mdl-26969729

RESUMEN

The signaling events that drive familial breast cancer (FBC) risk remain poorly understood. While the majority of genomic studies have focused on genetic risk variants, known risk variants account for at most 30% of FBC cases. Considering that multiple genes may influence FBC risk, we hypothesized that a pathway-based strategy examining different data types from multiple tissues could elucidate the biological basis for FBC. In this study, we performed integrated analyses of gene expression and exome-sequencing data from peripheral blood mononuclear cells and showed that cell adhesion pathways are significantly and consistently dysregulated in women who develop FBC. The dysregulation of cell adhesion pathways in high-risk women was also identified by pathway-based profiling applied to normal breast tissue data from two independent cohorts. The results of our genomic analyses were validated in normal primary mammary epithelial cells from high-risk and control women, using cell-based functional assays, drug-response assays, fluorescence microscopy, and Western blotting assays. Both genomic and cell-based experiments indicate that cell-cell and cell-extracellular matrix adhesion processes seem to be disrupted in non-malignant cells of women at high risk for FBC and suggest a potential role for these processes in FBC development.


Asunto(s)
Neoplasias de la Mama/metabolismo , Predisposición Genética a la Enfermedad , Transducción de Señal , Anciano , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Adhesión Celular , Estudios de Cohortes , Femenino , Perfilación de la Expresión Génica , Variación Genética , Humanos , Leucocitos Mononucleares/metabolismo , Persona de Mediana Edad
12.
J Biol Chem ; 290(20): 12487-96, 2015 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-25770209

RESUMEN

The phospho-binding protein 14-3-3ζ acts as a signaling hub controlling a network of interacting partners and oncogenic pathways. We show here that lysines within the 14-3-3ζ binding pocket and protein-protein interface can be modified by acetylation. The positive charge on two of these lysines, Lys(49) and Lys(120), is critical for coordinating 14-3-3ζ-phosphoprotein interactions. Through screening, we identified HDAC6 as the Lys(49)/Lys(120) deacetylase. Inhibition of HDAC6 blocks 14-3-3ζ interactions with two well described interacting partners, Bad and AS160, which triggers their dephosphorylation at Ser(112) and Thr(642), respectively. Expression of an acetylation-refractory K49R/K120R mutant of 14-3-3ζ rescues both the HDAC6 inhibitor-induced loss of interaction and Ser(112)/Thr(642) phosphorylation. Furthermore, expression of the K49R/K120R mutant of 14-3-3ζ inhibits the cytotoxicity of HDAC6 inhibition. These data demonstrate a novel role for HDAC6 in controlling 14-3-3ζ binding activity.


Asunto(s)
Proteínas 14-3-3/metabolismo , Histona Desacetilasas/metabolismo , Proteínas 14-3-3/genética , Acetilación , Sustitución de Aminoácidos , Sitios de Unión , Supervivencia Celular/genética , Proteínas Activadoras de GTPasa/genética , Proteínas Activadoras de GTPasa/metabolismo , Células HEK293 , Histona Desacetilasa 6 , Histona Desacetilasas/genética , Humanos , Lisina/genética , Lisina/metabolismo , Mutación Missense , Proteína Letal Asociada a bcl/genética , Proteína Letal Asociada a bcl/metabolismo
13.
Bioinformatics ; 31(22): 3666-72, 2015 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-26209429

RESUMEN

MOTIVATION: The Cancer Genome Atlas (TCGA) RNA-Sequencing data are used widely for research. TCGA provides 'Level 3' data, which have been processed using a pipeline specific to that resource. However, we have found using experimentally derived data that this pipeline produces gene-expression values that vary considerably across biological replicates. In addition, some RNA-Sequencing analysis tools require integer-based read counts, which are not provided with the Level 3 data. As an alternative, we have reprocessed the data for 9264 tumor and 741 normal samples across 24 cancer types using the Rsubread package. We have also collated corresponding clinical data for these samples. We provide these data as a community resource. RESULTS: We compared TCGA samples processed using either pipeline and found that the Rsubread pipeline produced fewer zero-expression genes and more consistent expression levels across replicate samples than the TCGA pipeline. Additionally, we used a genomic-signature approach to estimate HER2 (ERBB2) activation status for 662 breast-tumor samples and found that the Rsubread data resulted in stronger predictions of HER2 pathway activity. Finally, we used data from both pipelines to classify 575 lung cancer samples based on histological type. This analysis identified various non-coding RNA that may influence lung-cancer histology. AVAILABILITY AND IMPLEMENTATION: The RNA-Sequencing and clinical data can be downloaded from Gene Expression Omnibus (accession number GSE62944). Scripts and code that were used to process and analyze the data are available from https://github.com/srp33/TCGA_RNASeq_Clinical. CONTACT: stephen_piccolo@byu.edu or andreab@genetics.utah.edu SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.


Asunto(s)
Neoplasias de la Mama/genética , Genoma Humano , Análisis de Secuencia de ARN/métodos , Estadística como Asunto , Neoplasias de la Mama/clasificación , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Curva ROC , Reproducibilidad de los Resultados
14.
Bioinformatics ; 31(11): 1745-53, 2015 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-25617415

RESUMEN

MOTIVATION: Although gene-expression signature-based biomarkers are often developed for clinical diagnosis, many promising signatures fail to replicate during validation. One major challenge is that biological samples used to generate and validate the signature are often from heterogeneous biological contexts-controlled or in vitro samples may be used to generate the signature, but patient samples may be used for validation. In addition, systematic technical biases from multiple genome-profiling platforms often mask true biological variation. Addressing such challenges will enable us to better elucidate disease mechanisms and provide improved guidance for personalized therapeutics. RESULTS: Here, we present a pathway profiling toolkit, Adaptive Signature Selection and InteGratioN (ASSIGN), which enables robust and context-specific pathway analyses by efficiently capturing pathway activity in heterogeneous sets of samples and across profiling technologies. The ASSIGN framework is based on a flexible Bayesian factor analysis approach that allows for simultaneous profiling of multiple correlated pathways and for the adaptation of pathway signatures into specific disease. We demonstrate the robustness and versatility of ASSIGN in estimating pathway activity in simulated data, cell lines perturbed pathways and in primary tissues samples including The Cancer Genome Atlas breast carcinoma samples and liver samples exposed to genotoxic carcinogens. AVAILABILITY AND IMPLEMENTATION: Software for our approach is available for download at: http://www.bioconductor.org/packages/release/bioc/html/ASSIGN.html and https://github.com/wevanjohnson/ASSIGN.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Programas Informáticos , Animales , Teorema de Bayes , Neoplasias de la Mama/genética , Femenino , Genómica/métodos , Humanos , Ratas , Transducción de Señal/genética
15.
Proc Natl Acad Sci U S A ; 110(44): 17778-83, 2013 Oct 29.
Artículo en Inglés | MEDLINE | ID: mdl-24128763

RESUMEN

Over the past two decades, many biotechnology platforms have been developed for high-throughput gene expression profiling. However, because each platform is subject to technology-specific biases and produces distinct raw-data distributions, researchers have experienced difficulty in integrating data across platforms. Data integration is crucial to data-generating consortiums, researchers transitioning to newer profiling technologies, and individuals seeking to aggregate data across experiments. We address this need with our Universal exPression Code (UPC) approach, which corrects for platform-specific background noise using models that account for the genomic base composition and length of target regions; this approach also uses a mixture model to estimate whether a gene is active in a particular profiling sample. The latter produces standardized UPC values on a zero-to-one scale, so that they can be interpreted consistently, irrespective of profiling technology, thus enabling downstream analysis pipelines to be developed in a platform-agnostic manner. The UPC method can be applied to one- and two-channel expression microarrays and to next-generation sequencing data (RNA sequencing). Furthermore, UPCs are derived using information from within a given sample only--no ancillary samples are required at processing time. Thus, UPCs are suitable for personalized-medicine workflows where samples must be processed individually rather than in batches. In a variety of analyses and comparisons, UPCs perform comparably to other methods designed specifically for microarrays or RNA sequencing in most settings. Software for calculating UPCs is freely available at www.bioconductor.org/packages/release/bioc/html/SCAN.UPC.html.


Asunto(s)
Algoritmos , Código de Barras del ADN Taxonómico/métodos , Perfilación de la Expresión Génica/métodos , Genes/genética , Modelos Genéticos , Programas Informáticos , Activación Transcripcional/fisiología , Composición de Base
16.
Cancers (Basel) ; 16(13)2024 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-39001426

RESUMEN

Here, we assess how the differential expression of low molecular weight serum peptides might predict breast cancer progression with high confidence. We apply an LC/MS-MS-based, unbiased 'omics' analysis of serum samples from breast cancer patients to identify molecules that are differentially expressed in stage I and III breast cancer. Results were generated using standard and machine learning-based analytical workflows. With standard workflow, a discovery study yielded 65 circulating biomarker candidates with statistically significant differential expression. A second study confirmed the differential expression of a subset of these markers. Models based on combinations of multiple biomarkers were generated using an exploratory algorithm designed to generate greater diagnostic power and accuracy than any individual markers. Individual biomarkers and the more complex multi-marker models were then tested in a blinded validation study. The multi-marker models retained their predictive power in the validation study, the best of which attained an AUC of 0.84, with a sensitivity of 43% and a specificity of 88%. One of the markers with m/z 761.38, which was downregulated, was identified as a fibrinogen alpha chain. Machine learning-based analysis yielded a classifier that correctly categorizes every subject in the study and demonstrates parameter constraints required for high confidence in classifier output. These results suggest that serum peptide biomarker models could be optimized to assess breast cancer stage in a clinical setting.

17.
Mol Cancer Res ; 22(2): 137-151, 2024 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-37847650

RESUMEN

Beyond the most common oncogenes activated by mutation (mut-drivers), there likely exists a variety of low-frequency mut-drivers, each of which is a possible frontier for targeted therapy. To identify new and understudied mut-drivers, we developed a machine learning (ML) model that integrates curated clinical cancer data and posttranslational modification (PTM) proteomics databases. We applied the approach to 62,746 patient cancers spanning 84 cancer types and predicted 3,964 oncogenic mutations across 1,148 genes, many of which disrupt PTMs of known and unknown function. The list of putative mut-drivers includes established drivers and others with poorly understood roles in cancer. This ML model is available as a web application. As a case study, we focused the approach on nonreceptor tyrosine kinases (NRTK) and found a recurrent mutation in activated CDC42 kinase-1 (ACK1) that disrupts the Mig6 homology region (MHR) and ubiquitin-association (UBA) domains on the ACK1 C-terminus. By studying these domains in cultured cells, we found that disruption of the MHR domain helps activate the kinase while disruption of the UBA increases kinase stability by blocking its lysosomal degradation. This ACK1 mutation is analogous to lymphoma-associated mutations in its sister kinase, TNK1, which also disrupt a C-terminal inhibitory motif and UBA domain. This study establishes a mut-driver discovery tool for the research community and identifies a mechanism of ACK1 hyperactivation shared among ACK family kinases. IMPLICATIONS: This research identifies a potentially targetable activating mutation in ACK1 and other possible oncogenic mutations, including PTM-disrupting mutations, for further study.


Asunto(s)
Neoplasias , Proteómica , Humanos , Procesamiento Proteico-Postraduccional , Neoplasias/genética , Ubiquitina/metabolismo , Células Cultivadas , Proteínas Fetales/metabolismo , Proteínas Tirosina Quinasas/metabolismo
18.
Genomics ; 100(6): 337-44, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-22959562

RESUMEN

Gene-expression microarrays allow researchers to characterize biological phenomena in a high-throughput fashion but are subject to technological biases and inevitable variabilities that arise during sample collection and processing. Normalization techniques aim to correct such biases. Most existing methods require multiple samples to be processed in aggregate; consequently, each sample's output is influenced by other samples processed jointly. However, in personalized-medicine workflows, samples may arrive serially, so renormalizing all samples upon each new arrival would be impractical. We have developed Single Channel Array Normalization (SCAN), a single-sample technique that models the effects of probe-nucleotide composition on fluorescence intensity and corrects for such effects, dramatically increasing the signal-to-noise ratio within individual samples while decreasing variation across samples. In various benchmark comparisons, we show that SCAN performs as well as or better than competing methods yet has no dependence on external reference samples and can be applied to any single-channel microarray platform.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Medicina de Precisión/métodos , Análisis de Varianza , Fluorescencia , Ensayos Analíticos de Alto Rendimiento/métodos , Humanos , Tamaño de la Muestra , Sesgo de Selección , Relación Señal-Ruido , Flujo de Trabajo
19.
J Integr Bioinform ; 2023 Dec 05.
Artículo en Inglés | MEDLINE | ID: mdl-38047898

RESUMEN

TidyGEO is a Web-based tool for downloading, tidying, and reformatting data series from Gene Expression Omnibus (GEO). As a freely accessible repository with data from over 6 million biological samples across more than 4000 organisms, GEO provides diverse opportunities for secondary research. Although scientists may find assay data relevant to a given research question, most analyses require sample-level annotations. In GEO, such annotations are stored alongside assay data in delimited, text-based files. However, the structure and semantics of the annotations vary widely from one series to another, and many annotations are not useful for analysis purposes. Thus, every GEO series must be tidied before it is analyzed. Manual approaches may be used, but these are error prone and take time away from other research tasks. Custom computer scripts can be written, but many scientists lack the computational expertise to create such scripts. To address these challenges, we created TidyGEO, which supports essential data-cleaning tasks for sample-level annotations, such as selecting informative columns, renaming columns, splitting or merging columns, standardizing data values, and filtering samples. Additionally, users can integrate annotations with assay data, restructure assay data, and generate code that enables others to reproduce these steps.

20.
Database (Oxford) ; 20232023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36734300

RESUMEN

This study presents the outcomes of the shared task competition BioCreative VII (Task 3) focusing on the extraction of medication names from a Twitter user's publicly available tweets (the user's 'timeline'). In general, detecting health-related tweets is notoriously challenging for natural language processing tools. The main challenge, aside from the informality of the language used, is that people tweet about any and all topics, and most of their tweets are not related to health. Thus, finding those tweets in a user's timeline that mention specific health-related concepts such as medications requires addressing extreme imbalance. Task 3 called for detecting tweets in a user's timeline that mentions a medication name and, for each detected mention, extracting its span. The organizers made available a corpus consisting of 182 049 tweets publicly posted by 212 Twitter users with all medication mentions manually annotated. The corpus exhibits the natural distribution of positive tweets, with only 442 tweets (0.2%) mentioning a medication. This task was an opportunity for participants to evaluate methods that are robust to class imbalance beyond the simple lexical match. A total of 65 teams registered, and 16 teams submitted a system run. This study summarizes the corpus created by the organizers and the approaches taken by the participating teams for this challenge. The corpus is freely available at https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/. The methods and the results of the competing systems are analyzed with a focus on the approaches taken for learning from class-imbalanced data.


Asunto(s)
Minería de Datos , Procesamiento de Lenguaje Natural , Humanos , Minería de Datos/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA