RESUMO
BACKGROUND: The rapid advancement of new genomic sequencing technology has enabled the development of multi-omic single-cell sequencing assays. These assays profile multiple modalities in the same cell and can often yield new insights not revealed with a single modality. For example, Cellular Indexing of Transcriptomes and Epitopes by Sequencing (CITE-Seq) simultaneously profiles the RNA transcriptome and the surface protein expression. The surface protein markers in CITE-Seq can be used to identify cell populations similar to the iterative filtration process in flow cytometry, also called "gating", and is an essential step for downstream analyses and data interpretation. While several packages allow users to interactively gate cells, they often do not process multi-omic sequencing datasets and may require writing redundant code to specify gate boundaries. To streamline the gating process, we developed CITEViz which allows users to interactively gate cells in Seurat-processed CITE-Seq data. CITEViz can also visualize basic quality control (QC) metrics allowing for a rapid and holistic evaluation of CITE-Seq data. RESULTS: We applied CITEViz to a peripheral blood mononuclear cell CITE-Seq dataset and gated for several major blood cell populations (CD14 monocytes, CD4 T cells, CD8 T cells, NK cells, B cells, and platelets) using canonical surface protein markers. The visualization features of CITEViz were used to investigate cellular heterogeneity in CD14 and CD16-expressing monocytes and to detect differential numbers of detected antibodies per patient donor. These results highlight the utility of CITEViz to enable the robust classification of single cell populations. CONCLUSIONS: CITEViz is an R-Shiny app that standardizes the gating workflow in CITE-Seq data for efficient classification of cell populations. Its secondary function is to generate basic feature plots and QC figures specific to multi-omic data. The user interface and internal workflow of CITEViz uniquely work together to produce an organized workflow and sensible data structures for easy data retrieval. This package leverages the strengths of biologists and computational scientists to assess and analyze multi-omic single-cell datasets. In conclusion, CITEViz streamlines the flow cytometry gating workflow in CITE-Seq data to help facilitate novel hypothesis generation.
Assuntos
Leucócitos Mononucleares , Software , Humanos , Análise de Sequência de RNA/métodos , Fluxo de Trabalho , Citometria de Fluxo , Proteínas de Membrana , Análise de Célula Única/métodos , Perfilação da Expressão Gênica/métodosRESUMO
In plants, C-to-U RNA editing mainly occurs in plastid and mitochondrial transcripts, which contributes to a complex transcriptional regulatory network. More evidence reveals that RNA editing plays critical roles in plant growth and development. However, accurate detection of RNA editing sites using transcriptome sequencing data alone is still challenging. In the present study, we develop PlantC2U, which is a convolutional neural network, to predict plastid C-to-U RNA editing based on the genomic sequence. PlantC2U achieves >95% sensitivity and 99% specificity, which outperforms the PREPACT tool, random forests, and support vector machines. PlantC2U not only further checks RNA editing sites from transcriptome data to reduce possible false positives, but also assesses the effect of different mutations on C-to-U RNA editing based on the flanking sequences. Moreover, we found the patterns of tissue-specific RNA editing in the mangrove plant Kandelia obovata, and observed reduced C-to-U RNA editing rates in the cold stress response of K. obovata, suggesting their potential regulatory roles in plant stress adaptation. In addition, we present RNAeditDB, available online at https://jasonxu.shinyapps.io/RNAeditDB/. Together, PlantC2U and RNAeditDB will help researchers explore the RNA editing events in plants and thus will be of broad utility for the plant research community.
Assuntos
Aprendizado Profundo , Edição de RNA , Edição de RNA/genética , Plantas/metabolismo , Plastídeos/genética , Plastídeos/metabolismo , Transcriptoma , RNA de Plantas/genética , RNA de Plantas/metabolismoRESUMO
Sequential, multiple assignment, randomized trial (SMART) designs are appropriate for comparing adaptive treatment interventions, in which intermediate outcomes (called tailoring variables) guide subsequent treatment decisions for individual patients. Within a SMART design, patients may be re-randomized to subsequent treatments following the outcomes of their intermediate assessments. In this paper, we provide an overview of statistical considerations necessary to design and implement a two-stage SMART design with a binary tailoring variable and a survival final endpoint. A chronic lymphocytic leukemia trial with a final endpoint of progression-free survival is used as an example for the simulations to assess how design parameters, including, choice of randomization ratios for each stage of randomization, and response rates of the tailoring variable affect the statistical power. We assess the choice of weights from restricted re-randomization on data analyses and appropriate hazard rate assumptions. Specifically, for a given first-stage therapy and prior to the tailoring variable assessment, we assume equal hazard rates for all patients randomized to a treatment arm. After the tailoring variable assessment, individual hazard rates are assumed for each intervention path. Simulation studies demonstrate that the response rate of the binary tailoring variable impacts power as it directly impacts the distribution of patients. We also confirm that when the first stage randomization is 1:1, it is not necessary to consider the first stage randomization ratio when applying the weights. We provide an R-shiny application for obtaining power for a given sample size for SMART designs.
RESUMO
The R-package rbioacc allows to analyse experimental data from bioaccumulation tests where organisms are exposed to a chemical (exposure) then put into clean media (depuration). Internal concentrations are measured over time during the experiment. rbioacc provides turnkey functions to visualise and analyse such data. Under a Bayesian framework, rbioacc fits a generic one-compartment toxicokinetic model built from the data. It provides TK parameter estimates (uptake and elimination rates) and standard bioaccumulation metrics. All parameter estimates, bioaccumulation metrics and predictions of internal concentrations are delivered with their uncertainty. Bioaccumulation metrics are provided in support of environmental risk assessment, in full compliance with regulatory requirements required to approve market release of chemical substances. This paper provides worked examples of the use of rbioacc from data collected through standard bioaccumulation tests, publicly available within the scientific literature. These examples constitute step-by-step user-guides to analyse any new data set, uploaded in the right format.
Assuntos
Poluentes Químicos da Água , Teorema de Bayes , Bioacumulação , ToxicocinéticaRESUMO
BACKGROUND: Visual exploration of gene product behavior across multiple omic datasets can pinpoint technical limitations in data and reveal biological trends. Still, such exploration is challenging as there is a need for visualizations that are tailored for the purpose. RESULTS: The OmicLoupe software was developed to facilitate visual data exploration and provides more than 15 interactive cross-dataset visualizations for omics data. It expands visualizations to multiple datasets for quality control, statistical comparisons and overlap and correlation analyses, while allowing for rapid inspection and downloading of selected features. The usage of OmicLoupe is demonstrated in three different studies, where it allowed for detection of both technical data limitations and biological trends across different omic layers. An example is an analysis of SARS-CoV-2 infection based on two previously published studies, where OmicLoupe facilitated the identification of gene products with consistent expression changes across datasets at both the transcript and protein levels. CONCLUSIONS: OmicLoupe provides fast exploration of omics data with tailored visualizations for comparisons within and across data layers. The interactive visualizations are highly informative and are expected to be useful in various analyses of both newly generated and previously published data. OmicLoupe is available at quantitativeproteomics.org/omicloupe.
Assuntos
Biologia Computacional/instrumentação , Descoberta do Conhecimento , Software , COVID-19/genética , Interpretação Estatística de Dados , Humanos , Proteoma , TranscriptomaRESUMO
BACKGROUND: The investigation of molecular alterations associated with the conservation and variation of DNA methylation in eukaryotes is gaining interest in the biomedical research community. Among the different determinants of methylation stability, the DNA composition of the CpG surrounding regions has been shown to have a crucial role in the maintenance and establishment of methylation statuses. This aspect has been previously characterized in a quantitative manner by inspecting the nucleotidic composition in the region. Research in this field still lacks a qualitative perspective, linked to the identification of certain sequences (or DNA motifs) related to particular DNA methylation phenomena. RESULTS: Here we present a novel computational strategy based on short DNA motif discovery in order to characterize sequence patterns related to aberrant CpG methylation events. We provide our framework as a user-friendly, shiny-based application, CpGmotifs, to easily retrieve and characterize DNA patterns related to CpG methylation in the human genome. Our tool supports the functional interpretation of deregulated methylation events by predicting transcription factors binding sites (TFBS) encompassing the identified motifs. CONCLUSIONS: CpGmotifs is an open source software. Its source code is available on GitHub https://github.com/Greco-Lab/CpGmotifs and a ready-to-use docker image is provided on DockerHub at https://hub.docker.com/r/grecolab/cpgmotifs .
Assuntos
Metilação de DNA , Genoma Humano , Ilhas de CpG , Humanos , Motivos de Nucleotídeos , SoftwareRESUMO
BACKGROUND: Prostate cancer (PCa) represents a significant healthcare problem. The critical clinical question is the need for a biopsy. Accurate risk stratification of patients before a biopsy can allow for individualised risk stratification thus improving clinical decision making. This study aims to build a risk calculator to inform the need for a prostate biopsy. METHODS: Using the clinical information of 4801 patients an Irish Prostate Cancer Risk Calculator (IPRC) for diagnosis of PCa and high grade (Gleason ≥7) was created using a binary regression model including age, digital rectal examination, family history of PCa, negative prior biopsy and Prostate-specific antigen (PSA) level as risk factors. The discrimination ability of the risk calculator is internally validated using cross validation to reduce overfitting, and its performance compared with PSA and the American risk calculator (PCPT), Prostate Biopsy Collaborative Group (PBCG) and European risk calculator (ERSPC) using various performance outcome summaries. In a subgroup of 2970 patients, prostate volume was included. Separate risk calculators including the prostate volume (IPRCv) for the diagnosis of PCa (and high-grade PCa) was created. RESULTS: IPRC area under the curve (AUC) for the prediction of PCa and high-grade PCa was 0.6741 (95% CI, 0.6591 to 0.6890) and 0.7214 (95% CI, 0.7018 to 0.7409) respectively. This significantly outperforms the predictive ability of cancer detection for PSA (0.5948), PCPT (0.6304), PBCG (0.6528) and ERSPC (0.6502) risk calculators; and also, for detecting high-grade cancer for PSA (0.6623) and PCPT (0.6804) but there was no significant improvement for PBCG (0.7185) and ERSPC (0.7140). The inclusion of prostate volume into the risk calculator significantly improved the AUC for cancer detection (AUC = 0.7298; 95% CI, 0.7119 to 0.7478), but not for high-grade cancer (AUC = 0.7256; 95% CI, 0.7017 to 0.7495). The risk calculator also demonstrated an increased net benefit on decision curve analysis. CONCLUSION: The risk calculator developed has advantages over prior risk stratification of prostate cancer patients before the biopsy. It will reduce the number of men requiring a biopsy and their exposure to its side effects. The interactive tools developed are beneficial to translate the risk calculator into practice and allows for clarity in the clinical recommendations.
Assuntos
Neoplasias da Próstata , Idoso , Biópsia , Estudos de Coortes , Humanos , Masculino , Pessoa de Meia-Idade , Antígeno Prostático Específico , Medição de RiscoRESUMO
BACKGROUND: Functional annotation of genes is an essential step in omics data analysis. Multiple databases and methods are currently available to summarize the functions of sets of genes into higher level representations, such as ontologies and molecular pathways. Annotating results from omics experiments into functional categories is essential not only to understand the underlying regulatory dynamics but also to compare multiple experimental conditions at a higher level of abstraction. Several tools are already available to the community to represent and compare functional profiles of omics experiments. However, when the number of experiments and/or enriched functional terms is high, it becomes difficult to interpret the results even when graphically represented. Therefore, there is currently a need for interactive and user-friendly tools to graphically navigate and further summarize annotations in order to facilitate results interpretation also when the dimensionality is high. RESULTS: We developed an approach that exploits the intrinsic hierarchical structure of several functional annotations to summarize the results obtained through enrichment analyses to higher levels of interpretation and to map gene related information at each summarized level. We built a user-friendly graphical interface that allows to visualize the functional annotations of one or multiple experiments at once. The tool is implemented as a R-Shiny application called FunMappOne and is available at https://github.com/grecolab/FunMappOne . CONCLUSION: FunMappOne is a R-shiny graphical tool that takes in input multiple lists of human or mouse genes, optionally along with their related modification magnitudes, computes the enriched annotations from Gene Ontology, Kyoto Encyclopedia of Genes and Genomes, or Reactome databases, and reports interactive maps of functional terms and pathways organized in rational groups. FunMappOne allows a fast and convenient comparison of multiple experiments and an easy way to interpret results.
Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Bases de Dados Factuais , Ontologia Genética , Genes , Anotação de Sequência Molecular , Software , Animais , Humanos , CamundongosRESUMO
BACKGROUND: Exploration of large data sets, such as shotgun metagenomic sequence or expression data, by biomedical experts and medical professionals remains as a major bottleneck in the scientific discovery process. Although tools for this purpose exist for 16S ribosomal RNA sequencing analysis, there is a growing but still insufficient number of user-friendly interactive visualization workflows for easy data exploration and figure generation. The development of such platforms for this purpose is necessary to accelerate and streamline microbiome laboratory research. RESULTS: We developed the Workflow Hub for Automated Metagenomic Exploration (WHAM!) as a web-based interactive tool capable of user-directed data visualization and statistical analysis of annotated shotgun metagenomic and metatranscriptomic data sets. WHAM! includes exploratory and hypothesis-based gene and taxa search modules for visualizing differences in microbial taxa and gene family expression across experimental groups, and for creating publication quality figures without the need for command line interface or in-house bioinformatics. CONCLUSIONS: WHAM! is an interactive and customizable tool for downstream metagenomic and metatranscriptomic analysis providing a user-friendly interface allowing for easy data exploration by microbiome and ecological experts to facilitate discovery in multi-dimensional and large-scale data sets.
Assuntos
Biologia Computacional/métodos , Metagenômica/métodos , Microbiota/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
To streamline the analysis and visualization of bacterial growth and gene expression data obtained by microtitre plate readers, we developed BactEXTRACT, an intuitive, easy-to-use R Shiny application. BactEXTRACT simplifies the transition from raw optical density, fluorescence and luminescence measurements to publication-ready plots. This package offers a user-friendly interface that reduces the complexity involved in growth curve and gene expression analysis and is generally applicable. BactEXTRACT is available at https://veeninglab.com/bactextract.
RESUMO
Introduction: There is an urgent need to address pervasive inequities in health and healthcare in the USA. Many areas of health inequity are well known, but there remain important unexplored areas, and for many populations in the USA, accessing data to visualize and monitor health equity is difficult. Methods: We describe the development and evaluation of an open-source, R-Shiny application, the "Health Equity Explorer (H2E)," designed to enable users to explore health equity data in a way that can be easily shared within and across common data models (CDMs). Results: We have developed a novel, scalable informatics tool to explore a wide variety of drivers of health, including patient-reported Social Determinants of Health (SDoH), using data in an OMOP CDM research data repository in a way that can be easily shared. We describe our development process, data schema, potential use cases, and pilot data for 705,686 people who attended our health system at least once since 2016. For this group, 996,382 unique observations for questions related to food and housing security were available for 324,630 patients (at least one answer for all 46% of patients) with 65,152 (20.1% of patients with at least one visit and answer) reporting food or housing insecurity at least once. Conclusions: H2E can be used to support dynamic and interactive explorations that include rich social and environmental data. The tool can support multiple CDMs and has the potential to support distributed health equity research and intervention on a national scale.
RESUMO
We previously developed shinyCircos, an interactive web application for creating Circos diagrams, which has been widely recognized for its graphical user interface and ease of use. Here, we introduce shinyCircos-V2.0, an upgraded version of shinyCircos that includes a new user interface with enhanced usability and many new features for creating advanced Circos plots. To help users get started with shinyCircos-V2.0, we provide detailed tutorials and example input data sets. The application is available online at https://venyao.xyz/shinyCircos/ and https://asiawang.shinyapps.io/shinyCircos/, or can be installed locally using the source code deposited in GitHub (https://github.com/YaoLab-Bioinfo/shinyCircos-V2.0).
RESUMO
Background/aim: Single-cell transcriptomics (scRNA-Seq) explores cellular diversity at the gene expression level. Due to the inherent sparsity and noise in scRNA-Seq data and the uncertainty on the types of sequenced cells, effective clustering and cell type annotation are essential. The graph-based clustering of scRNA-Seq data is a simple yet powerful approach that presents data as a "shared nearest neighbour" graph and clusters the cells using graph clustering algorithms. These algorithms are dependent on several user-defined parameters.Here we present SUMA, a lightweight tool that uses a random forest model to predict the optimum number of neighbours to obtain the optimum clustering results. Moreover, we integrated our method with other commonly used methods in an RShiny application. SUMA can be used in a local environment (https://github.com/hkarakurt8742/SUMA) or as a browser tool (https://hkarakurt.shinyapps.io/suma/). Materials and methods: Publicly available scRNA-Seq datasets and 3 different graph-based clustering algorithms were used to develop SUMA, and a large range for number of neighbours and variant genes was taken into consideration. The quality of clustering was assessed using the adjusted Rand index (ARI) and true labels of each dataset. The data were split into training and test datasets, and the model was built and optimised using Scikit-learn (Python) and randomForest (R) libraries. Results: The accuracy of our machine learning model was 0.96, while the AUC of the ROC curve was 0.98. The model indicated that the number of cells in scRNA-Seq data is the most important feature when deciding the number of neighbours. Conclusion: We developed and evaluated the SUMA model and implemented the method in the SUMAShiny app, which integrates SUMA with different clustering methods and enables nonbioinformatician users to cluster and visualise their scRNA data easily. The SUMAShiny app is available both for desktop and browser use.
RESUMO
BACKGROUND: Managing and investigating all available genetic resources are challenging. As an alternative, breeders and researchers use core collection-a representative subset of the entire collection. A good core is characterized by high genetic diversity and low repetitiveness. Among the several available software, GenoCore uses a coverage criterion that does not require computationally expensive distance-based metrics. RESULTS: ShinyCore is a new method to select a core collection through two phases. The first phase uses the coverage criterion to quickly attain a fixed coverage, and the second phase uses a newly devised score (referred to as the rarity score) to further enhance diversity. It can attain a fixed coverage faster than a currently available algorithm devised for the coverage criterion, so it will benefit users who have big data. ShinyCore attains the minimum coverage specified by a user faster than GenoCore, and it then seeks to add entries with the rarest allele for each marker. Therefore, measures of genetic diversity and distance can be improved. CONCLUSION: Although GenoCore is a fast algorithm, its implementation is difficult for those unfamiliar with R, ShinyCore can be easily implemented in Shiny with RStudio and an interactive web applet is available for those who are not familiar with programming languages.
RESUMO
Cotton leaf curl virus (CLCuV) causes devastating losses to fiber production in Central Asia. Viral spread across Asia in the last decade is causing concern that the virus will spread further before resistant varieties can be bred. Current development depends on screening each generation under disease pressure in a country where the disease is endemic. We utilized quantitative trait loci (QTL) mapping in four crosses with different sources of resistance to identify single nucleotide polymorphism (SNP) markers associated with the resistance trait to allow development of varieties without the need for field screening every generation. To assist in the analysis of multiple populations, a new publicly available R/Shiny App was developed to streamline genetic mapping using SNP arrays and to also provide an easy method to convert and deposit genetic data into the CottonGen database. Results identified several QTL from each cross, indicating possible multiple modes of resistance. Multiple sources of resistance would provide several genetic routes to combat the virus as it evolves over time. Kompetitive allele specific PCR (KASP) markers were developed and validated for a subset of QTL, which can be used in further development of CLCuV-resistant cotton lines.
RESUMO
SEQUIN is a web-based application (app) that allows fast and intuitive analysis of RNA sequencing data derived for model organisms, tissues, and single cells. Integrated app functions enable uploading datasets, quality control, gene set enrichment, data visualization, and differential gene expression analysis. We also developed the iPSC Profiler, a practical gene module scoring tool that helps measure and compare pluripotent and differentiated cell types. Benchmarking to other commercial and non-commercial products underscored several advantages of SEQUIN. Freely available to the public, SEQUIN empowers scientists using interdisciplinary methods to investigate and present transcriptome data firsthand with state-of-the-art statistical methods. Hence, SEQUIN helps democratize and increase the throughput of interrogating biological questions using next-generation sequencing data with single-cell resolution.
Assuntos
Software , Transcriptoma , RNA-Seq , Transcriptoma/genética , Análise de Sequência de RNA/métodos , Redes Reguladoras de GenesRESUMO
Exploratory analysis of cancer consortia data curated by the cBioPortal repository typically requires advanced programming skills and expertise to identify novel genomic prognostic markers that have the potential for both diagnostic and therapeutic exploitation. We developed GNOSIS (GeNomics explOrer using StatistIcal and Survival analysis in R), an R Shiny App incorporating a range of R packages enabling users to efficiently explore and visualise such clinical and genomic data. GNOSIS provides an intuitive graphical user interface and multiple tab panels supporting a range of functionalities, including data upload and initial exploration, data recoding and subsetting, data visualisations, statistical analysis, mutation analysis and, in particular, survival analysis to identify prognostic markers. GNOSIS also facilitates reproducible research by providing downloadable input logs and R scripts from each session, and so offers an excellent means of supporting clinician-researchers in developing their statistical computing skills.
RESUMO
Online experiments allow for fast, massive, cost-efficient data collection. However, uncontrolled conditions in online experiments can be problematic, particularly when inferences hinge on response-times (RTs) in the millisecond range. To address this challenge, we developed a mobile-friendly open-source application using R-Shiny, a popular R package. In particular, we aimed to replicate the numerical distance effect, a well-established cognitive phenomenon. In the task, 169 participants (109 with a mobile device, 60 on a desktop computer) completed 116 trials displaying two-digit target numbers and decided whether they were larger or smaller than a fixed standard number. Sessions lasted ~7-minutes. Using generalized linear mixed models estimated with Bayesian inference methods, we observed a numerical distance effect: RTs decreased with the logarithm of the absolute difference between the target and the standard. Our results support the use of R-Shiny for RT-data collection. Furthermore, our method allowed us to measure systematic shifts in recorded RTs related to different OSs, web browsers, and devices, with mobile devices inducing longer shifts than desktop devices. Our work shows that precise RT measures can be reliably obtained online across mobile and desktop devices. It further paves the ground for the design of simple experimental tasks using R, a widely popular programming framework among cognitive scientists.
RESUMO
BACKGROUND AND OBJECTIVE: Optimal experimental design theory proposes choosing specific settings in experimental trials in order to maximize the precision of the resulting parameter estimates. In dose response experiments, this corresponds to choosing the optimal dose levels for every available observation, and can be applied both to singular dose-response relationships and to interaction experiments where two substances are given simultaneously at several different mixture ratios ("ray designs"). While the theory of experimental design for this situation is well developed, the mathematical complexity prevents widespread use in practical applications. A simple to use application making the theory accessible to practitioners is thus very desirable. METHODS: Results from established optimal experimental design theory are applied to dose response applications, focusing on log-logistic and Weibull class dose response functions. Suitable optimal design algorithms to solve these problems are implemented into an R-shiny based online application. RESULTS: The application provides an interface to easily calculate D-optimal designs not only for singular dose experiments, but also for interaction trials with several combination rays of substances. Furthermore, the app also allows evaluating the efficiency of existing candidate designs, and finally allows construction of designs which perform robustly under different assumptions in regard to the true parameters.
Assuntos
Algoritmos , Relação Dose-Resposta a Droga , Desenho de Fármacos , Interações Medicamentosas , Antineoplásicos/farmacologia , Cisplatino/farmacologia , Simulação por Computador , Avaliação Pré-Clínica de Medicamentos , Humanos , Projetos de PesquisaRESUMO
We present MultiEditR (Multiple Edit Deconvolution by Inference of Traces in R), the first algorithm specifically designed to detect and quantify RNA editing from Sanger sequencing (z.umn.edu/multieditr). Although RNA editing is routinely evaluated by measuring the heights of peaks from Sanger sequencing traces, the accuracy and precision of this approach has yet to be evaluated against gold standard next-generation sequencing methods. Through a comprehensive comparison to RNA sequencing (RNA-seq) and amplicon-based deep sequencing, we show that MultiEditR is accurate, precise, and reliable for detecting endogenous and programmable RNA editing.