Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 135
Filtrar
1.
J Extracell Vesicles ; 13(8): e12481, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39148266

RESUMO

From eukaryotes to prokaryotes, all cells secrete extracellular vesicles (EVs) as part of their regular homeostasis, intercellular communication, and cargo disposal. Accumulating evidence suggests that small EVs carry functional small RNAs, potentially serving as extracellular messengers and liquid-biopsy markers. Yet, the complete transcriptomic landscape of EV-associated small RNAs during disease progression is poorly delineated due to critical limitations including the protocols used for sequencing, suboptimal alignment of short reads (20-50 nt), and uncharacterized genome annotations-often denoted as the 'dark matter' of the genome. In this study, we investigate the EV-associated small unannotated RNAs that arise from endogenous genes and are part of the genomic 'dark matter', which may play a key emerging role in regulating gene expression and translational mechanisms. To address this, we created a distinct small RNAseq dataset from human prostate cancer & benign tissues, and EVs derived from blood (pre- & post-prostatectomy), urine, and human prostate carcinoma epithelial cell line. We then developed an unsupervised data-based bioinformatic pipeline that recognizes biologically relevant transcriptional signals irrespective of their genomic annotation. Using this approach, we discovered distinct EV-RNA expression patterns emerging from the un-annotated genomic regions (UGRs) of the transcriptomes associated with tissue-specific phenotypes. We have named these novel EV-associated small RNAs as 'EV-UGRs' or "EV-dark matter". Here, we demonstrate that EV-UGR gene expressions are downregulated by ∼100 fold (FDR < 0.05) in the circulating serum EVs from aggressive prostate cancer subjects. Remarkably, these EV-UGRs expression signatures were regained (upregulated) after radical prostatectomy in the same follow-up patients. Finally, we developed a stem-loop RT-qPCR assay that validated prostate cancer-specific EV-UGRs for selective fluid-based diagnostics. Overall, using an unsupervised data driven approach, we investigate the 'dark matter' of EV-transcriptome and demonstrate that EV-UGRs carry tissue-specific Information that significantly alters pre- and post-prostatectomy in the prostate cancer patients. Although further validation in randomized clinical trials is required, this new class of EV-RNAs hold promise in liquid-biopsy by avoiding highly invasive biopsy procedures in prostate cancer.


Assuntos
Vesículas Extracelulares , Neoplasias da Próstata , Vesículas Extracelulares/metabolismo , Humanos , Neoplasias da Próstata/genética , Neoplasias da Próstata/metabolismo , Neoplasias da Próstata/patologia , Masculino , Linhagem Celular Tumoral , Transcriptoma , Especificidade de Órgãos/genética , Regulação Neoplásica da Expressão Gênica
2.
Bioinform Adv ; 4(1): vbae093, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39011276

RESUMO

Motivation: The integration of vast, complex biological data with computational models offers profound insights and predictive accuracy. Yet, such models face challenges: poor generalization and limited labeled data. Results: To overcome these difficulties in binary classification tasks, we developed the Method for Optimal Classification by Aggregation (MOCA) algorithm, which addresses the problem of generalization by virtue of being an ensemble learning method and can be used in problems with limited or no labeled data. We developed both an unsupervised (uMOCA) and a supervised (sMOCA) variant of MOCA. For uMOCA, we show how to infer the MOCA weights in an unsupervised way, which are optimal under the assumption of class-conditioned independent classifier predictions. When it is possible to use labels, sMOCA uses empirically computed MOCA weights. We demonstrate the performance of uMOCA and sMOCA using simulated data as well as actual data previously used in Dialogue on Reverse Engineering and Methods (DREAM) challenges. We also propose an application of sMOCA for transfer learning where we use pre-trained computational models from a domain where labeled data are abundant and apply them to a different domain with less abundant labeled data. Availability and implementation: GitHub repository, https://github.com/robert-vogel/moca.

3.
JMIR AI ; 3: e50800, 2024 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-39073872

RESUMO

BACKGROUND: Clinical trials are vital for developing new therapies but can also delay drug development. Efficient trial data management, optimized trial protocol, and accurate patient identification are critical for reducing trial timelines. Natural language processing (NLP) has the potential to achieve these objectives. OBJECTIVE: This study aims to assess the feasibility of using data-driven approaches to optimize clinical trial protocol design and identify eligible patients. This involves creating a comprehensive eligibility criteria knowledge base integrated within electronic health records using deep learning-based NLP techniques. METHODS: We obtained data of 3281 industry-sponsored phase 2 or 3 interventional clinical trials recruiting patients with non-small cell lung cancer, prostate cancer, breast cancer, multiple myeloma, ulcerative colitis, and Crohn disease from ClinicalTrials.gov, spanning the period between 2013 and 2020. A customized bidirectional long short-term memory- and conditional random field-based NLP pipeline was used to extract all eligibility criteria attributes and convert hypernym concepts into computable hyponyms along with their corresponding values. To illustrate the simulation of clinical trial design for optimization purposes, we selected a subset of patients with non-small cell lung cancer (n=2775), curated from the Mount Sinai Health System, as a pilot study. RESULTS: We manually annotated the clinical trial eligibility corpus (485/3281, 14.78% trials) and constructed an eligibility criteria-specific ontology. Our customized NLP pipeline, developed based on the eligibility criteria-specific ontology that we created through manual annotation, achieved high precision (0.91, range 0.67-1.00) and recall (0.79, range 0.50-1) scores, as well as a high F1-score (0.83, range 0.67-1), enabling the efficient extraction of granular criteria entities and relevant attributes from 3281 clinical trials. A standardized eligibility criteria knowledge base, compatible with electronic health records, was developed by transforming hypernym concepts into machine-interpretable hyponyms along with their corresponding values. In addition, an interface prototype demonstrated the practicality of leveraging real-world data for optimizing clinical trial protocols and identifying eligible patients. CONCLUSIONS: Our customized NLP pipeline successfully generated a standardized eligibility criteria knowledge base by transforming hypernym criteria into machine-readable hyponyms along with their corresponding values. A prototype interface integrating real-world patient information allows us to assess the impact of each eligibility criterion on the number of patients eligible for the trial. Leveraging NLP and real-world data in a data-driven approach holds promise for streamlining the overall clinical trial process, optimizing processes, and improving efficiency in patient identification.

4.
Orphanet J Rare Dis ; 19(1): 183, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38698482

RESUMO

BACKGROUND: With over 7000 Mendelian disorders, identifying children with a specific rare genetic disorder diagnosis through structured electronic medical record data is challenging given incompleteness of records, inaccurate medical diagnosis coding, as well as heterogeneity in clinical symptoms and procedures for specific disorders. We sought to develop a digital phenotyping algorithm (PheIndex) using electronic medical records to identify children aged 0-3 diagnosed with genetic disorders or who present with illness with an increased risk for genetic disorders. RESULTS: Through expert opinion, we established 13 criteria for the algorithm and derived a score and a classification. The performance of each criterion and the classification were validated by chart review. PheIndex identified 1,088 children out of 93,154 live births who may be at an increased risk for genetic disorders. Chart review demonstrated that the algorithm achieved 90% sensitivity, 97% specificity, and 94% accuracy. CONCLUSIONS: The PheIndex algorithm can help identify when a rare genetic disorder may be present, alerting providers to consider ordering a diagnostic genetic test and/or referring a patient to a medical geneticist.


Assuntos
Algoritmos , Doenças Raras , Humanos , Doenças Raras/genética , Doenças Raras/diagnóstico , Lactente , Recém-Nascido , Pré-Escolar , Feminino , Masculino , Registros Eletrônicos de Saúde , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Fenótipo
5.
iScience ; 27(3): 108905, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38390492

RESUMO

Characterizing the effect of combination therapies is vital for treating diseases like cancer. We introduce correlated drug action (CDA), a baseline model for the study of drug combinations in both cell cultures and patient populations, which assumes that the efficacy of drugs in a combination may be correlated. We apply temporal CDA (tCDA) to clinical trial data, and demonstrate the utility of this approach in identifying possible synergistic combinations and others that can be explained in terms of monotherapies. Using MCF7 cell line data, we assess combinations with dose CDA (dCDA), a model that generalizes other proposed models (e.g., Bliss response-additivity, the dose equivalence principle), and introduce Excess over CDA (EOCDA), a new metric for identifying possible synergistic combinations in cell culture.

6.
Cell Rep Med ; 5(1): 101350, 2024 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-38134931

RESUMO

Every year, 11% of infants are born preterm with significant health consequences, with the vaginal microbiome a risk factor for preterm birth. We crowdsource models to predict (1) preterm birth (PTB; <37 weeks) or (2) early preterm birth (ePTB; <32 weeks) from 9 vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from public raw data via phylogenetic harmonization. The predictive models are validated on two independent unpublished datasets representing 331 samples from 148 pregnant individuals. The top-performing models (among 148 and 121 submissions from 318 teams) achieve area under the receiver operator characteristic (AUROC) curve scores of 0.69 and 0.87 predicting PTB and ePTB, respectively. Alpha diversity, VALENCIA community state types, and composition are important features in the top-performing models, most of which are tree-based methods. This work is a model for translation of microbiome data into clinically relevant predictive models and to better understand preterm birth.


Assuntos
Crowdsourcing , Microbiota , Nascimento Prematuro , Gravidez , Feminino , Recém-Nascido , Humanos , Filogenia , Vagina , Microbiota/genética
7.
J Thorac Dis ; 15(5): 2438-2449, 2023 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-37324065

RESUMO

Background: Although optimal sequencing of systemic therapy in cancer care is critical to achieving maximal clinical benefit, there is a lack of analysis of treatment sequencing in advanced non-small cell lung cancer (aNSCLC) in real-world settings. Methods: A retrospective cohort study of 13,340 lung cancer patients within the Mount Sinai Health System (MSHS) was performed. Systemic therapy data of aNSCLC in 2,106 patients was the starting point in our analysis to investigate how treatment sequencing has evolved, the impact of sequencing patterns on clinical outcomes, and the effectiveness of 2nd line chemotherapy after patients progressed on immune checkpoint inhibitor (ICI)-based therapy as the 1st line of therapy (LOT). Results: There is a significant shift to more ICI-based therapy and multiple lines of targeted therapy after 2015. We compared clinical outcomes of two patient populations with different treatment sequencing patterns, with the 1st group receiving chemotherapy as the 1st LOT followed by ICI-based treatment, and the 2nd group treated in the opposite order receiving a 1st line ICI-containing regimen followed by a 2nd line chemotherapy. No statistically significant difference in overall survival (OS) was observed between the two groups [group 2 vs. group 1, adjusted hazard ratio (aHR) =1.36, P=0.39]. We assessed the efficacy of the 2nd line chemotherapy in three patient populations given either 1st line ICI single agent, 1st line ICI-chemotherapy combination, or 1st line chemotherapy alone, there was no statistically significant difference in time-to-next treatment (TTNT) and in OS among the three patient groups. Conclusions: Analysis of real-world data has shown two treatment sequencing patterns in aNSCLC, ICI followed by chemotherapy or chemotherapy followed by ICI, achieved similar clinical benefit. The chemotherapies routinely used following platinum doublet 1st LOT, is effective as the 2nd line option after ICI-chemotherapy combination in the 1st line setting.

8.
medRxiv ; 2023 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-36945505

RESUMO

Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on novel datasets representing 331 samples from 148 pregnant individuals. From 318 DREAM challenge participants we received 148 and 121 submissions for our two separate prediction sub-challenges with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87, respectively. Alpha diversity, VALENCIA community state types, and composition (via phylotype relative abundance) were important features in the top performing models, most of which were tree based methods. This work serves as the foundation for subsequent efforts to translate predictive tests into clinical practice, and to better understand and prevent preterm birth.

9.
JMIR AI ; 2: e44537, 2023 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-38875565

RESUMO

BACKGROUND: Ground-glass opacities (GGOs) appearing in computed tomography (CT) scans may indicate potential lung malignancy. Proper management of GGOs based on their features can prevent the development of lung cancer. Electronic health records are rich sources of information on GGO nodules and their granular features, but most of the valuable information is embedded in unstructured clinical notes. OBJECTIVE: We aimed to develop, test, and validate a deep learning-based natural language processing (NLP) tool that automatically extracts GGO features to inform the longitudinal trajectory of GGO status from large-scale radiology notes. METHODS: We developed a bidirectional long short-term memory with a conditional random field-based deep-learning NLP pipeline to extract GGO and granular features of GGO retrospectively from radiology notes of 13,216 lung cancer patients. We evaluated the pipeline with quality assessments and analyzed cohort characterization of the distribution of nodule features longitudinally to assess changes in size and solidity over time. RESULTS: Our NLP pipeline built on the GGO ontology we developed achieved between 95% and 100% precision, 89% and 100% recall, and 92% and 100% F1-scores on different GGO features. We deployed this GGO NLP model to extract and structure comprehensive characteristics of GGOs from 29,496 radiology notes of 4521 lung cancer patients. Longitudinal analysis revealed that size increased in 16.8% (240/1424) of patients, decreased in 14.6% (208/1424), and remained unchanged in 68.5% (976/1424) in their last note compared to the first note. Among 1127 patients who had longitudinal radiology notes of GGO status, 815 (72.3%) were reported to have stable status, and 259 (23%) had increased/progressed status in the subsequent notes. CONCLUSIONS: Our deep learning-based NLP pipeline can automatically extract granular GGO features at scale from electronic health records when this information is documented in radiology notes and help inform the natural history of GGO. This will open the way for a new paradigm in lung cancer prevention and early detection.

10.
JAMA Netw Open ; 5(8): e2227423, 2022 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-36036935

RESUMO

Importance: An automated, accurate method is needed for unbiased assessment quantifying accrual of joint space narrowing and erosions on radiographic images of the hands and wrists, and feet for clinical trials, monitoring of joint damage over time, assisting rheumatologists with treatment decisions. Such a method has the potential to be directly integrated into electronic health records. Objectives: To design and implement an international crowdsourcing competition to catalyze the development of machine learning methods to quantify radiographic damage in rheumatoid arthritis (RA). Design, Setting, and Participants: This diagnostic/prognostic study describes the Rheumatoid Arthritis 2-Dialogue for Reverse Engineering Assessment and Methods (RA2-DREAM Challenge), which used existing radiographic images and expert-curated Sharp-van der Heijde (SvH) scores from 2 clinical studies (674 radiographic sets from 562 patients) for training (367 sets), leaderboard (119 sets), and final evaluation (188 sets). Challenge participants were tasked with developing methods to automatically quantify overall damage (subchallenge 1), joint space narrowing (subchallenge 2), and erosions (subchallenge 3). The challenge was finished on June 30, 2020. Main Outcomes and Measures: Scores derived from submitted algorithms were compared with the expert-curated SvH scores, and a baseline model was created for benchmark comparison. Performances were ranked using weighted root mean square error (RMSE). The performance and reproductivity of each algorithm was assessed using Bayes factor from bootstrapped data, and further evaluated with a postchallenge independent validation data set. Results: The RA2-DREAM Challenge received a total of 173 submissions from 26 participants or teams in 7 countries for the leaderboard round, and 13 submissions were included in the final evaluation. The weighted RMSEs metric showed that the winning algorithms produced scores that were very close to the expert-curated SvH scores. Top teams included Team Shirin for subchallenge 1 (weighted RMSE, 0.44), HYL-YFG (Hongyang Li and Yuanfang Guan) subchallenge 2 (weighted RMSE, 0.38), and Gold Therapy for subchallenge 3 (weighted RMSE, 0.43). Bootstrapping/Bayes factor approach and the postchallenge independent validation confirmed the reproducibility and the estimation concordance indices between final evaluation and postchallenge independent validation data set were 0.71 for subchallenge 1, 0.78 for subchallenge 2, and 0.82 for subchallenge 3. Conclusions and Relevance: The RA2-DREAM Challenge resulted in the development of algorithms that provide feasible, quick, and accurate methods to quantify joint damage in RA. Ultimately, these methods could help research studies on RA joint damage and may be integrated into electronic health records to help clinicians serve patients better by providing timely, reliable, and quantitative information for making treatment decisions to prevent further damage.


Assuntos
Artrite Reumatoide , Crowdsourcing , Artrite Reumatoide/diagnóstico por imagem , Artrite Reumatoide/tratamento farmacológico , Teorema de Bayes , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes
11.
iScience ; 25(6): 104414, 2022 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-35663013

RESUMO

Circulating extracellular vesicles (EVs) contain molecular footprints-lipids, proteins, RNA, and DNA-from their cell of origin. Consequently, EV-associated RNA and proteins have gained widespread interest as liquid-biopsy biomarkers. Yet, an integrative proteo-transcriptomic landscape of EVs and comparison with their cell of origin remains obscure. Here, we report that EVs enrich distinct proteo-transcriptome that does not linearly correlate with their cell of origin. We show that EVs enrich endosomal and extracellular proteins, small RNA (∼13-200 nucleotides) associated with cell differentiation, development, and Wnt signaling. EVs cargo specific RNAs (RNY3, vtRNA, and MIRLET-7) and their complementary proteins (YBX1, IGF2BP2, and SRSF1/2). To ensure an unbiased and independent analyses, we studied 12 cancer cell lines, matching EVs (inhouse and exRNA database), and serum EVs of patients with prostate cancer. Together, we show that EV-RNA-protein complexes may constitute a functional interaction network to protect and regulate molecular access until a function is achieved.

12.
Proc Natl Acad Sci U S A ; 118(34)2021 08 24.
Artigo em Inglês | MEDLINE | ID: mdl-34413191

RESUMO

Binary classification is one of the central problems in machine-learning research and, as such, investigations of its general statistical properties are of interest. We studied the ranking statistics of items in binary classification problems and observed that there is a formal and surprising relationship between the probability of a sample belonging to one of the two classes and the Fermi-Dirac distribution determining the probability that a fermion occupies a given single-particle quantum state in a physical system of noninteracting fermions. Using this equivalence, it is possible to compute a calibrated probabilistic output for binary classifiers. We show that the area under the receiver operating characteristics curve (AUC) in a classification problem is related to the temperature of an equivalent physical system. In a similar manner, the optimal decision threshold between the two classes is associated with the chemical potential of an equivalent physical system. Using our framework, we also derive a closed-form expression to calculate the variance for the AUC of a classifier. Finally, we introduce FiDEL (Fermi-Dirac-based ensemble learning), an ensemble learning algorithm that uses the calibrated nature of the classifier's output probability to combine possibly very different classifiers.

13.
Cell Rep Med ; 2(6): 100323, 2021 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-34195686

RESUMO

Identification of pregnancies at risk of preterm birth (PTB), the leading cause of newborn deaths, remains challenging given the syndromic nature of the disease. We report a longitudinal multi-omics study coupled with a DREAM challenge to develop predictive models of PTB. The findings indicate that whole-blood gene expression predicts ultrasound-based gestational ages in normal and complicated pregnancies (r = 0.83) and, using data collected before 37 weeks of gestation, also predicts the delivery date in both normal pregnancies (r = 0.86) and those with spontaneous preterm birth (r = 0.75). Based on samples collected before 33 weeks in asymptomatic women, our analysis suggests that expression changes preceding preterm prelabor rupture of the membranes are consistent across time points and cohorts and involve leukocyte-mediated immunity. Models built from plasma proteomic data predict spontaneous preterm delivery with intact membranes with higher accuracy and earlier in pregnancy than transcriptomic models (AUROC = 0.76 versus AUROC = 0.6 at 27-33 weeks of gestation).


Assuntos
Proteínas Sanguíneas/genética , Ácidos Nucleicos Livres/genética , Idade Gestacional , Pré-Eclâmpsia/genética , Nascimento Prematuro/genética , Transcriptoma , Adulto , Doenças Assintomáticas , Biomarcadores/sangue , Proteínas Sanguíneas/classificação , Proteínas Sanguíneas/metabolismo , Ácidos Nucleicos Livres/sangue , Ácidos Nucleicos Livres/classificação , Crowdsourcing/métodos , Feminino , Humanos , Recém-Nascido , Estudos Longitudinais , Pré-Eclâmpsia/sangue , Pré-Eclâmpsia/diagnóstico , Gravidez , Nascimento Prematuro/sangue , Nascimento Prematuro/diagnóstico , Proteômica/métodos , Curva ROC
14.
Gut ; 2021 Jul 28.
Artigo em Inglês | MEDLINE | ID: mdl-34321221

RESUMO

OBJECTIVE: Surveillance tools for early cancer detection are suboptimal, including hepatocellular carcinoma (HCC), and biomarkers are urgently needed. Extracellular vesicles (EVs) have gained increasing scientific interest due to their involvement in tumour initiation and metastasis; however, most extracellular RNA (exRNA) blood-based biomarker studies are limited to annotated genomic regions. DESIGN: EVs were isolated with differential ultracentrifugation and integrated nanoscale deterministic lateral displacement arrays (nanoDLD) and quality assessed by electron microscopy, immunoblotting, nanoparticle tracking and deconvolution analysis. Genome-wide sequencing of the largely unexplored small exRNA landscape, including unannotated transcripts, identified and reproducibly quantified small RNA clusters (smRCs). Their key genomic features were delineated across biospecimens and EV isolation techniques in prostate cancer and HCC. Three independent exRNA cancer datasets with a total of 479 samples from 375 patients, including longitudinal samples, were used for this study. RESULTS: ExRNA smRCs were dominated by uncharacterised, unannotated small RNA with a consensus sequence of 20 nt. An unannotated 3-smRC signature was significantly overexpressed in plasma exRNA of patients with HCC (p<0.01, n=157). An independent validation in a phase 2 biomarker case-control study revealed 86% sensitivity and 91% specificity for the detection of early HCC from controls at risk (n=209) (area under the receiver operating curve (AUC): 0.87). The 3-smRC signature was independent of alpha-fetoprotein (p<0.0001) and a composite model yielded an increased AUC of 0.93. CONCLUSION: These findings directly lead to the prospect of a minimally invasive, blood-only, operator-independent clinical tool for HCC surveillance, thus highlighting the potential of unannotated smRCs for biomarker research in cancer.

15.
Cell Syst ; 12(8): 827-838.e5, 2021 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-34146471

RESUMO

The accurate identification and quantitation of RNA isoforms present in the cancer transcriptome is key for analyses ranging from the inference of the impacts of somatic variants to pathway analysis to biomarker development and subtype discovery. The ICGC-TCGA DREAM Somatic Mutation Calling in RNA (SMC-RNA) challenge was a crowd-sourced effort to benchmark methods for RNA isoform quantification and fusion detection from bulk cancer RNA sequencing (RNA-seq) data. It concluded in 2018 with a comparison of 77 fusion detection entries and 65 isoform quantification entries on 51 synthetic tumors and 32 cell lines with spiked-in fusion constructs. We report the entries used to build this benchmark, the leaderboard results, and the experimental features associated with the accurate prediction of RNA species. This challenge required submissions to be in the form of containerized workflows, meaning each of the entries described is easily reusable through CWL and Docker containers at https://github.com/SMC-RNA-challenge. A record of this paper's transparent peer review process is included in the supplemental information.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Isoformas de Proteínas/genética , RNA/genética , RNA-Seq , Análise de Sequência de RNA
16.
Nat Commun ; 12(1): 3307, 2021 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-34083538

RESUMO

Despite decades of intensive search for compounds that modulate the activity of particular protein targets, a large proportion of the human kinome remains as yet undrugged. Effective approaches are therefore required to map the massive space of unexplored compound-kinase interactions for novel and potent activities. Here, we carry out a crowdsourced benchmarking of predictive algorithms for kinase inhibitor potencies across multiple kinase families tested on unpublished bioactivity data. We find the top-performing predictions are based on various models, including kernel learning, gradient boosting and deep learning, and their ensemble leads to a predictive accuracy exceeding that of single-dose kinase activity assays. We design experiments based on the model predictions and identify unexpected activities even for under-studied kinases, thereby accelerating experimental mapping efforts. The open-source prediction algorithms together with the bioactivities between 95 compounds and 295 kinases provide a resource for benchmarking prediction algorithms and for extending the druggable kinome.


Assuntos
Inibidores de Proteínas Quinases/farmacologia , Proteínas Quinases/metabolismo , Algoritmos , Benchmarking , Crowdsourcing , Bases de Dados de Produtos Farmacêuticos , Aprendizado Profundo , Descoberta de Drogas , Avaliação Pré-Clínica de Medicamentos , Humanos , Cinética , Aprendizado de Máquina , Modelos Biológicos , Modelos Químicos , Inibidores de Proteínas Quinases/química , Inibidores de Proteínas Quinases/farmacocinética , Proteínas Quinases/química , Proteômica , Análise de Regressão
17.
EBioMedicine ; 66: 103275, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33745882

RESUMO

BACKGROUND: Assistive automatic seizure detection can empower human annotators to shorten patient monitoring data review times. We present a proof-of-concept for a seizure detection system that is sensitive, automated, patient-specific, and tunable to maximise sensitivity while minimizing human annotation times. The system uses custom data preparation methods, deep learning analytics and electroencephalography (EEG) data. METHODS: Scalp EEG data of 365 patients containing 171,745 s ictal and 2,185,864 s interictal samples obtained from clinical monitoring systems were analysed as part of a crowdsourced artificial intelligence (AI) challenge. Participants were tasked to develop an ictal/interictal classifier with high sensitivity and low false alarm rates. We built a challenge platform that prevented participants from downloading or directly accessing the data while allowing crowdsourced model development. FINDINGS: The automatic detection system achieved tunable sensitivities between 75.00% and 91.60% allowing a reduction in the amount of raw EEG data to be reviewed by a human annotator by factors between 142x, and 22x respectively. The algorithm enables instantaneous reviewer-managed optimization of the balance between sensitivity and the amount of raw EEG data to be reviewed. INTERPRETATION: This study demonstrates the utility of deep learning for patient-specific seizure detection in EEG data. Furthermore, deep learning in combination with a human reviewer can provide the basis for an assistive data labelling system lowering the time of manual review while maintaining human expert annotation performance. FUNDING: IBM employed all IBM Research authors. Temple University employed all Temple University authors. The Icahn School of Medicine at Mount Sinai employed Eren Ahsen. The corresponding authors Stefan Harrer and Gustavo Stolovitzky declare that they had full access to all the data in the study and that they had final responsibility for the decision to submit for publication.


Assuntos
Inteligência Artificial , Encéfalo/fisiopatologia , Eletroencefalografia , Neurologistas , Convulsões/diagnóstico , Algoritmos , Análise de Dados , Aprendizado Profundo , Eletroencefalografia/métodos , Eletroencefalografia/normas , Epilepsia/diagnóstico , Humanos , Reprodutibilidade dos Testes
18.
Front Genet ; 12: 778416, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35047007

RESUMO

We now know RNA can survive the harsh environment of biofluids when encapsulated in vesicles or by associating with lipoproteins or RNA binding proteins. These extracellular RNA (exRNA) play a role in intercellular signaling, serve as biomarkers of disease, and form the basis of new strategies for disease treatment. The Extracellular RNA Communication Consortium (ERCC) hosted a two-day online workshop (April 19-20, 2021) on the unique challenges of exRNA data analysis. The goal was to foster an open dialog about best practices and discuss open problems in the field, focusing initially on small exRNA sequencing data. Video recordings of workshop presentations and discussions are available (https://exRNA.org/exRNAdata2021-videos/). There were three target audiences: experimentalists who generate exRNA sequencing data, computational and data scientists who work with those groups to analyze their data, and experimental and data scientists new to the field. Here we summarize issues explored during the workshop, including progress on an effort to develop an exRNA data analysis challenge to engage the community in solving some of these open problems.

19.
Bioinformatics ; 37(14): 2070-2072, 2021 08 04.
Artigo em Inglês | MEDLINE | ID: mdl-33241320

RESUMO

SUMMARY: The advent of high-throughput technologies has provided researchers with measurements of thousands of molecular entities and enable the investigation of the internal regulatory apparatus of the cell. However, network inference from high-throughput data is far from being a solved problem. While a plethora of different inference methods have been proposed, they often lead to non-overlapping predictions, and many of them lack user-friendly implementations to enable their broad utilization. Here, we present Consensus Interaction Network Inference Service (COSIFER), a package and a companion web-based platform to infer molecular networks from expression data using state-of-the-art consensus approaches. COSIFER includes a selection of state-of-the-art methodologies for network inference and different consensus strategies to integrate the predictions of individual methods and generate robust networks. AVAILABILITY AND IMPLEMENTATION: COSIFER Python source code is available at https://github.com/PhosphorylatedRabbits/cosifer. The web service is accessible at https://ibm.biz/cosifer-aas. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Consenso
20.
Elife ; 92020 09 18.
Artigo em Inglês | MEDLINE | ID: mdl-32945258

RESUMO

Our ability to discover effective drug combinations is limited, in part by insufficient understanding of how the transcriptional response of two monotherapies results in that of their combination. We analyzed matched time course RNAseq profiling of cells treated with single drugs and their combinations and found that the transcriptional signature of the synergistic combination was unique relative to that of either constituent monotherapy. The sequential activation of transcription factors in time in the gene regulatory network was implicated. The nature of this transcriptional cascade suggests that drug synergy may ensue when the transcriptional responses elicited by two unrelated individual drugs are correlated. We used these results as the basis of a simple prediction algorithm attaining an AUROC of 0.77 in the prediction of synergistic drug combinations in an independent dataset.


Assuntos
Combinação de Medicamentos , Sinergismo Farmacológico , Expressão Gênica , Redes Reguladoras de Genes/fisiologia , Transcriptoma , Algoritmos , Biologia Computacional , Humanos , Células MCF-7 , RNA-Seq , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...