Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
Bioinform Adv ; 4(1): vbae093, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39011276

RESUMO

Motivation: The integration of vast, complex biological data with computational models offers profound insights and predictive accuracy. Yet, such models face challenges: poor generalization and limited labeled data. Results: To overcome these difficulties in binary classification tasks, we developed the Method for Optimal Classification by Aggregation (MOCA) algorithm, which addresses the problem of generalization by virtue of being an ensemble learning method and can be used in problems with limited or no labeled data. We developed both an unsupervised (uMOCA) and a supervised (sMOCA) variant of MOCA. For uMOCA, we show how to infer the MOCA weights in an unsupervised way, which are optimal under the assumption of class-conditioned independent classifier predictions. When it is possible to use labels, sMOCA uses empirically computed MOCA weights. We demonstrate the performance of uMOCA and sMOCA using simulated data as well as actual data previously used in Dialogue on Reverse Engineering and Methods (DREAM) challenges. We also propose an application of sMOCA for transfer learning where we use pre-trained computational models from a domain where labeled data are abundant and apply them to a different domain with less abundant labeled data. Availability and implementation: GitHub repository, https://github.com/robert-vogel/moca.

2.
iScience ; 27(3): 108905, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38390492

RESUMO

Characterizing the effect of combination therapies is vital for treating diseases like cancer. We introduce correlated drug action (CDA), a baseline model for the study of drug combinations in both cell cultures and patient populations, which assumes that the efficacy of drugs in a combination may be correlated. We apply temporal CDA (tCDA) to clinical trial data, and demonstrate the utility of this approach in identifying possible synergistic combinations and others that can be explained in terms of monotherapies. Using MCF7 cell line data, we assess combinations with dose CDA (dCDA), a model that generalizes other proposed models (e.g., Bliss response-additivity, the dose equivalence principle), and introduce Excess over CDA (EOCDA), a new metric for identifying possible synergistic combinations in cell culture.

3.
Mol Psychiatry ; 29(2): 387-401, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38177352

RESUMO

Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers.


Assuntos
Psiquiatria Biológica , Aprendizado de Máquina , Humanos , Psiquiatria Biológica/métodos , Psiquiatria/métodos , Pesquisa Biomédica/métodos
4.
iScience ; 27(1): 108770, 2024 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-38261919

RESUMO

The Centers for Disease Control and Prevention promoted the Test-to-Stay (TTS) program to facilitate in-person instruction in K-12 schools during COVID-19. This program delineates guidelines for schools to regularly test students and staff to minimize risks of infection transmission. TTS enrollment can be implemented via two different consent models: opt-in, in which students do not test regularly by default, and the opposite, opt-out model. We study the impacts of the two enrollment approaches on testing and positivity rates with data from 259 schools in Illinois. Our results indicate that after controlling for other covariates, schools following the opt-out model are associated with 84% higher testing rate and 30% lower positivity rate. If all schools adopted the opt-out model, 20% of the total lost school days could have been saved. The lower positivity rate among the opt-out group is largely explained by the higher testing rate in these schools, a manifestation of status quo bias.

5.
JAMA Netw Open ; 5(11): e2242343, 2022 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-36409497

RESUMO

Importance: With a shortfall in fellowship-trained breast radiologists, mammography screening programs are looking toward artificial intelligence (AI) to increase efficiency and diagnostic accuracy. External validation studies provide an initial assessment of how promising AI algorithms perform in different practice settings. Objective: To externally validate an ensemble deep-learning model using data from a high-volume, distributed screening program of an academic health system with a diverse patient population. Design, Setting, and Participants: In this diagnostic study, an ensemble learning method, which reweights outputs of the 11 highest-performing individual AI models from the Digital Mammography Dialogue on Reverse Engineering Assessment and Methods (DREAM) Mammography Challenge, was used to predict the cancer status of an individual using a standard set of screening mammography images. This study was conducted using retrospective patient data collected between 2010 and 2020 from women aged 40 years and older who underwent a routine breast screening examination and participated in the Athena Breast Health Network at the University of California, Los Angeles (UCLA). Main Outcomes and Measures: Performance of the challenge ensemble method (CEM) and the CEM combined with radiologist assessment (CEM+R) were compared with diagnosed ductal carcinoma in situ and invasive cancers within a year of the screening examination using performance metrics, such as sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC). Results: Evaluated on 37 317 examinations from 26 817 women (mean [SD] age, 58.4 [11.5] years), individual model AUROC estimates ranged from 0.77 (95% CI, 0.75-0.79) to 0.83 (95% CI, 0.81-0.85). The CEM model achieved an AUROC of 0.85 (95% CI, 0.84-0.87) in the UCLA cohort, lower than the performance achieved in the Kaiser Permanente Washington (AUROC, 0.90) and Karolinska Institute (AUROC, 0.92) cohorts. The CEM+R model achieved a sensitivity (0.813 [95% CI, 0.781-0.843] vs 0.826 [95% CI, 0.795-0.856]; P = .20) and specificity (0.925 [95% CI, 0.916-0.934] vs 0.930 [95% CI, 0.929-0.932]; P = .18) similar to the radiologist performance. The CEM+R model had significantly lower sensitivity (0.596 [95% CI, 0.466-0.717] vs 0.850 [95% CI, 0.766-0.923]; P < .001) and specificity (0.803 [95% CI, 0.734-0.861] vs 0.945 [95% CI, 0.936-0.954]; P < .001) than the radiologist in women with a prior history of breast cancer and Hispanic women (0.894 [95% CI, 0.873-0.910] vs 0.926 [95% CI, 0.919-0.933]; P = .004). Conclusions and Relevance: This study found that the high performance of an ensemble deep-learning model for automated screening mammography interpretation did not generalize to a more diverse screening cohort, suggesting that the model experienced underspecification. This study suggests the need for model transparency and fine-tuning of AI models for specific target populations prior to their clinical adoption.


Assuntos
Neoplasias da Mama , Mamografia , Humanos , Feminino , Adulto , Pessoa de Meia-Idade , Inteligência Artificial , Neoplasias da Mama/diagnóstico por imagem , Estudos Retrospectivos , Detecção Precoce de Câncer
6.
Proc Natl Acad Sci U S A ; 118(34)2021 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-34413191

RESUMO

Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi-Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers. We show that the area under the receiver operating characteristics curve (AUC) in a classification problem is related to the temperature of an equivalent physical system. In a similar manner, the optimal decision threshold between the two classes is associated with the chemical potential of an equivalent physical system. Using our framework, we also derive a closed-form expression to calculate the variance for the AUC of a classifier. Finally, we introduce FiDEL (Fermi-Dirac-based ensemble learning), an ensemble learning algorithm that uses the calibrated nature of the classifier's output probability to combine possibly very different classifiers.

7.
Gut ; 2021 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-34321221

RESUMO

OBJECTIVE: Surveillance tools for early cancer detection are suboptimal, including hepatocellular carcinoma (HCC), and biomarkers are urgently needed. Extracellular vesicles (EVs) have gained increasing scientific interest due to their involvement in tumour initiation and metastasis; however, most extracellular RNA (exRNA) blood-based biomarker studies are limited to annotated genomic regions. DESIGN: EVs were isolated with differential ultracentrifugation and integrated nanoscale deterministic lateral displacement arrays (nanoDLD) and quality assessed by electron microscopy, immunoblotting, nanoparticle tracking and deconvolution analysis. Genome-wide sequencing of the largely unexplored small exRNA landscape, including unannotated transcripts, identified and reproducibly quantified small RNA clusters (smRCs). Their key genomic features were delineated across biospecimens and EV isolation techniques in prostate cancer and HCC. Three independent exRNA cancer datasets with a total of 479 samples from 375 patients, including longitudinal samples, were used for this study. RESULTS: ExRNA smRCs were dominated by uncharacterised, unannotated small RNA with a consensus sequence of 20 nt. An unannotated 3-smRC signature was significantly overexpressed in plasma exRNA of patients with HCC (p<0.01, n=157). An independent validation in a phase 2 biomarker case-control study revealed 86% sensitivity and 91% specificity for the detection of early HCC from controls at risk (n=209) (area under the receiver operating curve (AUC): 0.87). The 3-smRC signature was independent of alpha-fetoprotein (p<0.0001) and a composite model yielded an increased AUC of 0.93. CONCLUSION: These findings directly lead to the prospect of a minimally invasive, blood-only, operator-independent clinical tool for HCC surveillance, thus highlighting the potential of unannotated smRCs for biomarker research in cancer.

8.
PLoS Genet ; 17(6): e1009589, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-34166362

RESUMO

Cancer testis antigens (CTAs) are an extensive gene family with a unique expression pattern restricted to germ cells, but aberrantly reactivated in cancer tissues. Studies indicate that the expression (or re-expression) of CTAs within the MAGE-A family is common in hepatocellular carcinoma (HCC). However, no systematic characterization has yet been reported. The aim of this study is to perform a comprehensive profile of CTA de-regulation in HCC and experimentally evaluate the role of MAGEA3 as a driver of HCC progression. The transcriptomic analysis of 44 multi-regionally sampled HCCs from 12 patients identified high intra-tumor heterogeneity of CTAs. In addition, a subset of CTAs was significantly overexpressed in histologically poorly differentiated regions. Further analysis of CTAs in larger patient cohorts revealed high CTA expression related to worse overall survival and several other markers of poor prognosis. Functional analysis of MAGEA3 was performed in human HCC cell lines by gene silencing and in a genetic mouse model by overexpression of MAGEA3 in the liver. Knockdown of MAGEA3 decreased cell proliferation, colony formation and increased apoptosis. MAGEA3 overexpression was associated with more aggressive tumors in vivo. In conclusion MAGEA3 enhances tumor progression and should be considered as a novel therapeutic target in HCC.


Assuntos
Antígenos de Neoplasias/genética , Antígenos de Neoplasias/imunologia , Carcinoma Hepatocelular/patologia , Neoplasias Hepáticas/patologia , Proteínas de Neoplasias/genética , Testículo/imunologia , Transcriptoma , Apoptose/genética , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/imunologia , Proliferação de Células/genética , Progressão da Doença , Perfilação da Expressão Gênica , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/imunologia , Masculino , Prognóstico , Regulação para Cima
9.
Cell Syst ; 12(8): 827-838.e5, 2021 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-34146471

RESUMO

The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to benchmark methods for RNA isoform quantification and fusion detection from bulk cancer RNA sequencing (RNA-seq) data. It concluded in 2018 with a comparison of 77 fusion detection entries and 65 isoform quantification entries on 51 synthetic tumors and 32 cell lines with spiked-in fusion constructs. We report the entries used to build this benchmark, the leaderboard results, and the experimental features associated with the accurate prediction of RNA species. This challenge required submissions to be in the form of containerized workflows, meaning each of the entries described is easily reusable through CWL and Docker containers at https://github.com/SMC-RNA-challenge. A record of this paper's transparent peer review process is included in the supplemental information.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Isoformas de Proteínas/genética , RNA/genética , RNA-Seq , Análise de Sequência de RNA
10.
J Med Syst ; 45(6): 64, 2021 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-33948743

RESUMO

Ongoing research efforts have been examining how to utilize artificial intelligence technology to help healthcare consumers make sense of their clinical data, such as diagnostic radiology reports. How to promote the acceptance of such novel technology is a heated research topic. Recent studies highlight the importance of providing local explanations about AI prediction and model performance to help users determine whether to trust AI's predictions. Despite some efforts, limited empirical research has been conducted to quantitatively measure how AI explanations impact healthcare consumers' perceptions of using patient-facing, AI-powered healthcare systems. The aim of this study is to evaluate the effects of different AI explanations on people's perceptions of AI-powered healthcare system. In this work, we designed and deployed a large-scale experiment (N = 3,423) on Amazon Mechanical Turk (MTurk) to evaluate the effects of AI explanations on people's perceptions in the context of comprehending radiology reports. We created four groups based on two factors-the extent of explanations for the prediction (High vs. Low Transparency) and the model performance (Good vs. Weak AI Model)-and randomly assigned participants to one of the four conditions. Participants were instructed to classify a radiology report as describing a normal or abnormal finding, followed by completing a post-study survey to indicate their perceptions of the AI tool. We found that revealing model performance information can promote people's trust and perceived usefulness of system outputs, while providing local explanations for the rationale of a prediction can promote understandability but not necessarily trust. We also found that when model performance is low, the more information the AI system discloses, the less people would trust the system. Lastly, whether human agrees with AI predictions or not and whether the AI prediction is correct or not could also influence the effect of AI explanations. We conclude this paper by discussing implications for designing AI systems for healthcare consumers to interpret diagnostic report.


Assuntos
Inteligência Artificial , Radiologia , Atenção à Saúde , Humanos , Percepção , Radiografia
11.
EBioMedicine ; 66: 103275, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33745882

RESUMO

BACKGROUND: Assistive automatic seizure detection can empower human annotators to shorten patient monitoring data review times. We present a proof-of-concept for a seizure detection system that is sensitive, automated, patient-specific, and tunable to maximise sensitivity while minimizing human annotation times. The system uses custom data preparation methods, deep learning analytics and electroencephalography (EEG) data. METHODS: Scalp EEG data of 365 patients containing 171,745 s ictal and 2,185,864 s interictal samples obtained from clinical monitoring systems were analysed as part of a crowdsourced artificial intelligence (AI) challenge. Participants were tasked to develop an ictal/interictal classifier with high sensitivity and low false alarm rates. We built a challenge platform that prevented participants from downloading or directly accessing the data while allowing crowdsourced model development. FINDINGS: The automatic detection system achieved tunable sensitivities between 75.00% and 91.60% allowing a reduction in the amount of raw EEG data to be reviewed by a human annotator by factors between 142x, and 22x respectively. The algorithm enables instantaneous reviewer-managed optimization of the balance between sensitivity and the amount of raw EEG data to be reviewed. INTERPRETATION: This study demonstrates the utility of deep learning for patient-specific seizure detection in EEG data. Furthermore, deep learning in combination with a human reviewer can provide the basis for an assistive data labelling system lowering the time of manual review while maintaining human expert annotation performance. FUNDING: IBM employed all IBM Research authors. Temple University employed all Temple University authors. The Icahn School of Medicine at Mount Sinai employed Eren Ahsen. The corresponding authors Stefan Harrer and Gustavo Stolovitzky declare that they had full access to all the data in the study and that they had final responsibility for the decision to submit for publication.


Assuntos
Inteligência Artificial , Encéfalo/fisiopatologia , Eletroencefalografia , Neurologistas , Convulsões/diagnóstico , Algoritmos , Análise de Dados , Aprendizado Profundo , Eletroencefalografia/métodos , Eletroencefalografia/normas , Epilepsia/diagnóstico , Humanos , Reprodutibilidade dos Testes
12.
Bioinformatics ; 37(14): 2070-2072, 2021 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-33241320

RESUMO

SUMMARY: The advent of high-throughput technologies has provided researchers with measurements of thousands of molecular entities and enable the investigation of the internal regulatory apparatus of the cell. However, network inference from high-throughput data is far from being a solved problem. While a plethora of different inference methods have been proposed, they often lead to non-overlapping predictions, and many of them lack user-friendly implementations to enable their broad utilization. Here, we present Consensus Interaction Network Inference Service (COSIFER), a package and a companion web-based platform to infer molecular networks from expression data using state-of-the-art consensus approaches. COSIFER includes a selection of state-of-the-art methodologies for network inference and different consensus strategies to integrate the predictions of individual methods and generate robust networks. AVAILABILITY AND IMPLEMENTATION: COSIFER Python source code is available at https://github.com/PhosphorylatedRabbits/cosifer. The web service is accessible at https://ibm.biz/cosifer-aas. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Consenso
13.
Life Sci Alliance ; 3(11)2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32972997

RESUMO

Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pair-rule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Análise Espacial , Algoritmos , Animais , Bases de Dados Genéticas , Drosophila/genética , Previsões/métodos , Regulação da Expressão Gênica no Desenvolvimento/genética , Redes Reguladoras de Genes/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Peixe-Zebra/genética
14.
Elife ; 92020 09 18.
Artigo em Inglês | MEDLINE | ID: mdl-32945258

RESUMO

Our ability to discover effective drug combinations is limited, in part by insufficient understanding of how the transcriptional response of two monotherapies results in that of their combination. We analyzed matched time course RNAseq profiling of cells treated with single drugs and their combinations and found that the transcriptional signature of the synergistic combination was unique relative to that of either constituent monotherapy. The sequential activation of transcription factors in time in the gene regulatory network was implicated. The nature of this transcriptional cascade suggests that drug synergy may ensue when the transcriptional responses elicited by two unrelated individual drugs are correlated. We used these results as the basis of a simple prediction algorithm attaining an AUROC of 0.77 in the prediction of synergistic drug combinations in an independent dataset.


Assuntos
Combinação de Medicamentos , Sinergismo Farmacológico , Expressão Gênica , Redes Reguladoras de Genes/fisiologia , Transcriptoma , Algoritmos , Biologia Computacional , Humanos , Células MCF-7 , RNA-Seq , Fatores de Transcrição/metabolismo
15.
JAMA Netw Open ; 3(3): e200265, 2020 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-32119094

RESUMO

Importance: Mammography screening currently relies on subjective human interpretation. Artificial intelligence (AI) advances could be used to increase mammography screening accuracy by reducing missed cancers and false positives. Objective: To evaluate whether AI can overcome human mammography interpretation limitations with a rigorous, unbiased evaluation of machine learning algorithms. Design, Setting, and Participants: In this diagnostic accuracy study conducted between September 2016 and November 2017, an international, crowdsourced challenge was hosted to foster AI algorithm development focused on interpreting screening mammography. More than 1100 participants comprising 126 teams from 44 countries participated. Analysis began November 18, 2016. Main Outcomes and Measurements: Algorithms used images alone (challenge 1) or combined images, previous examinations (if available), and clinical and demographic risk factor data (challenge 2) and output a score that translated to cancer yes/no within 12 months. Algorithm accuracy for breast cancer detection was evaluated using area under the curve and algorithm specificity compared with radiologists' specificity with radiologists' sensitivity set at 85.9% (United States) and 83.9% (Sweden). An ensemble method aggregating top-performing AI algorithms and radiologists' recall assessment was developed and evaluated. Results: Overall, 144 231 screening mammograms from 85 580 US women (952 cancer positive ≤12 months from screening) were used for algorithm training and validation. A second independent validation cohort included 166 578 examinations from 68 008 Swedish women (780 cancer positive). The top-performing algorithm achieved an area under the curve of 0.858 (United States) and 0.903 (Sweden) and 66.2% (United States) and 81.2% (Sweden) specificity at the radiologists' sensitivity, lower than community-practice radiologists' specificity of 90.5% (United States) and 98.5% (Sweden). Combining top-performing algorithms and US radiologist assessments resulted in a higher area under the curve of 0.942 and achieved a significantly improved specificity (92.0%) at the same sensitivity. Conclusions and Relevance: While no single AI algorithm outperformed radiologists, an ensemble of AI algorithms combined with radiologist assessment in a single-reader screening environment improved overall accuracy. This study underscores the potential of using machine learning methods for enhancing mammography screening interpretation.


Assuntos
Neoplasias da Mama/diagnóstico por imagem , Aprendizado Profundo , Interpretação de Imagem Assistida por Computador/métodos , Mamografia/métodos , Radiologistas , Adulto , Idoso , Algoritmos , Inteligência Artificial , Detecção Precoce de Câncer , Feminino , Humanos , Pessoa de Meia-Idade , Radiologia , Sensibilidade e Especificidade , Suécia , Estados Unidos
16.
J Comput Biol ; 27(9): 1337-1340, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-31905016

RESUMO

The increasing availability of complex data in biology and medicine has promoted the use of machine learning in classification tasks to address important problems in translational and fundamental science. Two important obstacles, however, may limit the unraveling of the full potential of machine learning in these fields: the lack of generalization of the resulting models and the limited number of labeled data sets in some applications. To address these important problems, we developed an unsupervised ensemble algorithm called strategy for unsupervised multiple method aggregation (SUMMA). By virtue of being an ensemble method, SUMMA is more robust to generalization than the predictions it combines. By virtue of being unsupervised, SUMMA does not require labeled data. SUMMA receives as input predictions from a diversity of models and estimates their classification performance even when labeled data are unavailable. It then uses these performance estimates to combine these different predictions into an ensemble model. SUMMA can be applied to a variety of binary classification problems in bioinformatics including but not limited to gene network inference, cancer diagnostics, drug response prediction, somatic mutation, and differential expression calling. In this application note, we introduce the R/PY-SUMMA packages, available in R or Python, that implement the SUMMA algorithm.


Assuntos
Biologia Computacional/estatística & dados numéricos , Redes Reguladoras de Genes/genética , Aprendizado de Máquina não Supervisionado/estatística & dados numéricos , Algoritmos , Modelos Estatísticos
17.
J Natl Cancer Inst ; 112(2): 179-190, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31095341

RESUMO

BACKGROUND: A total of 10%-20% of patients develop long-term toxicity following radiotherapy for prostate cancer. Identification of common genetic variants associated with susceptibility to radiotoxicity might improve risk prediction and inform functional mechanistic studies. METHODS: We conducted an individual patient data meta-analysis of six genome-wide association studies (n = 3871) in men of European ancestry who underwent radiotherapy for prostate cancer. Radiotoxicities (increased urinary frequency, decreased urinary stream, hematuria, rectal bleeding) were graded prospectively. We used grouped relative risk models to test associations with approximately 6 million genotyped or imputed variants (time to first grade 2 or higher toxicity event). Variants with two-sided Pmeta less than 5 × 10-8 were considered statistically significant. Bayesian false discovery probability provided an additional measure of confidence. Statistically significant variants were evaluated in three Japanese cohorts (n = 962). All statistical tests were two-sided. RESULTS: Meta-analysis of the European ancestry cohorts identified three genomic signals: single nucleotide polymorphism rs17055178 with rectal bleeding (Pmeta = 6.2 × 10-10), rs10969913 with decreased urinary stream (Pmeta = 2.9 × 10-10), and rs11122573 with hematuria (Pmeta = 1.8 × 10-8). Fine-scale mapping of these three regions was used to identify another independent signal (rs147121532) associated with hematuria (Pconditional = 4.7 × 10-6). Credible causal variants at these four signals lie in gene-regulatory regions, some modulating expression of nearby genes. Previously identified variants showed consistent associations (rs17599026 with increased urinary frequency, rs7720298 with decreased urinary stream, rs1801516 with overall toxicity) in new cohorts. rs10969913 and rs17599026 had similar effects in the photon-treated Japanese cohorts. CONCLUSIONS: This study increases the understanding of the architecture of common genetic variants affecting radiotoxicity, points to novel radio-pathogenic mechanisms, and develops risk models for testing in clinical studies. Further multinational radiogenomics studies in larger cohorts are worthwhile.

18.
Sci Rep ; 9(1): 12970, 2019 09 10.
Artigo em Inglês | MEDLINE | ID: mdl-31506535

RESUMO

Biological and regulatory mechanisms underlying many multi-gene expression-based disease biomarkers are often not readily evident. We describe an innovative framework, NeTFactor, that combines network analyses with gene expression data to identify transcription factors (TFs) that significantly and maximally regulate such a biomarker. NeTFactor uses a computationally-inferred context-specific gene regulatory network and applies topological, statistical, and optimization methods to identify regulator TFs. Application of NeTFactor to a multi-gene expression-based asthma biomarker identified ETS translocation variant 4 (ETV4) and peroxisome proliferator-activated receptor gamma (PPARG) as the biomarker's most significant TF regulators. siRNA-based knock down of these TFs in an airway epithelial cell line model demonstrated significant reduction of cytokine expression relevant to asthma, validating NeTFactor's top-scoring findings. While PPARG has been associated with airway inflammation, ETV4 has not yet been implicated in asthma, thus indicating the possibility of novel, disease-relevant discovery by NeTFactor. We also show that NeTFactor's results are robust when the gene regulatory network and biomarker are derived from independent data. Additionally, our application of NeTFactor to a different disease biomarker identified TF regulators of interest. These results illustrate that the application of NeTFactor to multi-gene expression-based biomarkers could yield valuable insights into regulatory mechanisms and biological processes underlying disease.


Assuntos
Algoritmos , Asma/genética , Asma/patologia , Biomarcadores/análise , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Estudos de Casos e Controles , Estudos de Coortes , Perfilação da Expressão Gênica , Humanos , Transdução de Sinais , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
19.
Nat Commun ; 10(1): 2674, 2019 06 17.
Artigo em Inglês | MEDLINE | ID: mdl-31209238

RESUMO

The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca's large drug combination dataset, consisting of 11,576 experiments from 910 combinations across 85 molecularly characterized cancer cell lines, and results of a DREAM Challenge to evaluate computational strategies for predicting synergistic drug pairs and biomarkers. 160 teams participated to provide a comprehensive methodological development and benchmarking. Winning methods incorporate prior knowledge of drug-target interactions. Synergy is predicted with an accuracy matching biological replicates for >60% of combinations. However, 20% of drug combinations are poorly predicted by all methods. Genomic rationale for synergy predictions are identified, including ADAM17 inhibitor antagonism when combined with PIK3CB/D inhibition contrasting to synergy when combined with other PI3K-pathway inhibitors in PIK3CA mutant cells.


Assuntos
Protocolos de Quimioterapia Combinada Antineoplásica/farmacologia , Biologia Computacional/métodos , Neoplasias/tratamento farmacológico , Farmacogenética/métodos , Proteína ADAM17/antagonistas & inibidores , Protocolos de Quimioterapia Combinada Antineoplásica/uso terapêutico , Benchmarking , Biomarcadores Tumorais/genética , Linhagem Celular Tumoral , Biologia Computacional/normas , Conjuntos de Dados como Assunto , Antagonismo de Drogas , Resistencia a Medicamentos Antineoplásicos/efeitos dos fármacos , Resistencia a Medicamentos Antineoplásicos/genética , Sinergismo Farmacológico , Genômica/métodos , Humanos , Terapia de Alvo Molecular/métodos , Mutação , Neoplasias/genética , Farmacogenética/normas , Fosfatidilinositol 3-Quinases/genética , Inibidores de Fosfoinositídeo-3 Quinase , Resultado do Tratamento
20.
Nat Commun ; 9(1): 4418, 2018 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-30356117

RESUMO

The response to respiratory viruses varies substantially between individuals, and there are currently no known molecular predictors from the early stages of infection. Here we conduct a community-based analysis to determine whether pre- or early post-exposure molecular factors could predict physiologic responses to viral exposure. Using peripheral blood gene expression profiles collected from healthy subjects prior to exposure to one of four respiratory viruses (H1N1, H3N2, Rhinovirus, and RSV), as well as up to 24 h following exposure, we find that it is possible to construct models predictive of symptomatic response using profiles even prior to viral exposure. Analysis of predictive gene features reveal little overlap among models; however, in aggregate, these genes are enriched for common pathways. Heme metabolism, the most significantly enriched pathway, is associated with a higher risk of developing symptoms following viral exposure. This study demonstrates that pre-exposure molecular predictors can be identified and improves our understanding of the mechanisms of response to respiratory viruses.


Assuntos
Expressão Gênica/genética , Voluntários Saudáveis , Heme/metabolismo , Humanos , Vírus da Influenza A Subtipo H1N2/imunologia , Vírus da Influenza A Subtipo H1N2/patogenicidade , Vírus da Influenza A Subtipo H3N2/imunologia , Vírus da Influenza A Subtipo H3N2/patogenicidade , Vírus Sinciciais Respiratórios/imunologia , Vírus Sinciciais Respiratórios/patogenicidade , Rhinovirus/imunologia , Rhinovirus/patogenicidade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...