Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
1.
Nat Protoc ; 2024 Apr 02.
Article in English | MEDLINE | ID: mdl-38565959

ABSTRACT

Methods for analyzing the full complement of a biomolecule type, e.g., proteomics or metabolomics, generate large amounts of complex data. The software tools used to analyze omics data have reshaped the landscape of modern biology and become an essential component of biomedical research. These tools are themselves quite complex and often require the installation of other supporting software, libraries and/or databases. A researcher may also be using multiple different tools that require different versions of the same supporting materials. The increasing dependence of biomedical scientists on these powerful tools creates a need for easier installation and greater usability. Packaging and containerization are different approaches to satisfy this need by delivering omics tools already wrapped in additional software that makes the tools easier to install and use. In this systematic review, we describe and compare the features of prominent packaging and containerization platforms. We outline the challenges, advantages and limitations of each approach and some of the most widely used platforms from the perspectives of users, software developers and system administrators. We also propose principles to make the distribution of omics software more sustainable and robust to increase the reproducibility of biomedical and life science research.

2.
Brief Bioinform ; 24(4)2023 07 20.
Article in English | MEDLINE | ID: mdl-37291798

ABSTRACT

The ability to identify and track T-cell receptor (TCR) sequences from patient samples is becoming central to the field of cancer research and immunotherapy. Tracking genetically engineered T cells expressing TCRs that target specific tumor antigens is important to determine the persistence of these cells and quantify tumor responses. The available high-throughput method to profile TCR repertoires is generally referred to as TCR sequencing (TCR-Seq). However, the available TCR-Seq data are limited compared with RNA sequencing (RNA-Seq). In this paper, we have benchmarked the ability of RNA-Seq-based methods to profile TCR repertoires by examining 19 bulk RNA-Seq samples across 4 cancer cohorts including both T-cell-rich and T-cell-poor tissue types. We have performed a comprehensive evaluation of the existing RNA-Seq-based repertoire profiling methods using targeted TCR-Seq as the gold standard. We also highlighted scenarios under which the RNA-Seq approach is suitable and can provide comparable accuracy to the TCR-Seq approach. Our results show that RNA-Seq-based methods are able to effectively capture the clonotypes and estimate the diversity of TCR repertoires, as well as provide relative frequencies of clonotypes in T-cell-rich tissues and low-diversity repertoires. However, RNA-Seq-based TCR profiling methods have limited power in T-cell-poor tissues, especially in highly diverse repertoires of T-cell-poor tissues. The results of our benchmarking provide an additional appealing argument to incorporate RNA-Seq into the immune repertoire screening of cancer patients as it offers broader knowledge into the transcriptomic changes that exceed the limited information provided by TCR-Seq.


Subject(s)
Benchmarking , Neoplasms , Humans , Receptors, Antigen, T-Cell/genetics , T-Lymphocytes , Neoplasms/genetics , Sequence Analysis, RNA
3.
Front Immunol ; 13: 954078, 2022.
Article in English | MEDLINE | ID: mdl-36451811

ABSTRACT

T cell receptor (TCR) studies have grown substantially with the advancement in the sequencing techniques of T cell receptor repertoire sequencing (TCR-Seq). The analysis of the TCR-Seq data requires computational skills to run the computational analysis of TCR repertoire tools. However biomedical researchers with limited computational backgrounds face numerous obstacles to properly and efficiently utilizing bioinformatics tools for analyzing TCR-Seq data. Here we report pyTCR, a computational notebook-based solution for comprehensive and scalable TCR-Seq data analysis. Computational notebooks, which combine code, calculations, and visualization, are able to provide users with a high level of flexibility and transparency for the analysis. Additionally, computational notebooks are demonstrated to be user-friendly and suitable for researchers with limited computational skills. Our tool has a rich set of functionalities including various TCR metrics, statistical analysis, and customizable visualizations. The application of pyTCR on large and diverse TCR-Seq datasets will enable the effective analysis of large-scale TCR-Seq data with flexibility, and eventually facilitate new discoveries.


Subject(s)
Data Analysis , Receptors, Antigen, T-Cell , Reproducibility of Results , Receptors, Antigen, T-Cell/genetics , Benchmarking , Computational Biology
4.
Nat Methods ; 19(4): 429-440, 2022 04.
Article in English | MEDLINE | ID: mdl-35396482

ABSTRACT

Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.


Subject(s)
Metagenome , Metagenomics , Archaea/genetics , Metagenomics/methods , Reproducibility of Results , Sequence Analysis, DNA , Software
5.
Gigascience ; 122022 12 28.
Article in English | MEDLINE | ID: mdl-36852763

ABSTRACT

BACKGROUND: Metagenomic taxonomic profiling aims to predict the identity and relative abundance of taxa in a given whole-genome sequencing metagenomic sample. A recent surge in computational methods that aim to accurately estimate taxonomic profiles, called taxonomic profilers, has motivated community-driven efforts to create standardized benchmarking datasets and platforms, standardized taxonomic profile formats, and a benchmarking platform to assess tool performance. While this standardization is essential, there is currently a lack of tools to visualize the standardized output of the many existing taxonomic profilers. Thus, benchmarking studies rely on a single-value metrics to compare performance of tools and compare to benchmarking datasets. This is one of the major problems in analyzing metagenomic profiling data, since single metrics, such as the F1 score, fail to capture the biological differences between the datasets. FINDINGS: Here we report the development of TAMPA (Taxonomic metagenome profiling evaluation), a robust and easy-to-use method that allows scientists to easily interpret and interact with taxonomic profiles produced by the many different taxonomic profiler methods beyond the standard metrics used by the scientific community. We demonstrate the unique ability of TAMPA to generate a novel biological hypothesis by highlighting the taxonomic differences between samples otherwise missed by commonly utilized metrics. CONCLUSION: In this study, we show that TAMPA can help visualize the output of taxonomic profilers, enabling biologists to effectively choose the most appropriate profiling method to use on their metagenomics data. TAMPA is available on GitHub, Bioconda, and Galaxy Toolshed at https://github.com/dkoslicki/TAMPA and is released under the MIT license.


Subject(s)
Benchmarking , Metagenomics , Metagenome , Whole Genome Sequencing
6.
Cogit. Enferm. (Online) ; 26: e75169, 2021. tab
Article in Portuguese | LILACS-Express | LILACS, BDENF - Nursing | ID: biblio-1345890

ABSTRACT

RESUMO Objetivo: identificar a prevalência e fatores associados à participação do companheiro da gestante no pré-natal. Método: estudo transversal realizado entre março e julho de 2018 por meio de entrevista com 655 puérperas de uma regional do Nordeste brasileiro. Estimou-se associações com uso do Qui-quadrado e Razão de Prevalência. Resultados: dentre mulheres com companheiro e que realizaram pré-natal (85,6%; n= 561), a participação do parceiro foi de (44,2%; n=248), sendo maior entre aquelas que planejaram a gravidez (RP: 1,25; IC 95%: 1,07-2,10), desejaram engravidar (RP: 1,22; IC 95%: 1,01-1,98), iniciaram precocemente o acompanhamento (RP: 1,31; IC 95%: 1,01-2,46) e realizaram seis ou mais consultas (RP: 1,49; IC 95%: 1,32-1,81). Houve menor participação entre mulheres com baixa escolaridade (RP: 0,72; IC 95%: 0,39-0,77) e que utilizaram serviço público (RP: 0,65; IC 95%: 0,24-0,85). Conclusão: a baixa prevalência de participação do companheiro da gestante no pré-natal evidencia a necessidade de maior estímulo à sua inclusão neste processo.


RESUMEN Objetivo: identificar la prevalencia y los factores asociados a la participación del acompañante de la gestante en el prenatal. Método: estudio transversal realizado entre marzo y julio de 2018 mediante una entrevista con 655 puérperas de un hospital regional del Nordeste de Brasil. Las asociaciones se estimaron mediante la Chi-cuadrado y la Razón de Prevalencia. Resultados: entre las mujeres con pareja que tuvieron control prenatal (85,6%; n= 561), la participación de la pareja fue (44,2%; n=248), siendo mayor entre las que planificaron el embarazo (PR: 1,25; IC 95%: 1,07-2,10), deseaban quedarse embarazadas (PR: 1,22; IC 95%: 1,01-1,98), iniciaron precozmente un seguimiento (PR: 1,31; IC 95%: 1,01-2,46) y tenían seis o más consultas (PR: 1,49; IC 95%: 1,32-1,81). La participación fue menor entre las mujeres con bajo nivel educativo (RP: 0,72; IC 95%: 0,39-0,77) y que utilizaron los servicios públicos (RP: 0,65; IC 95%: 0,24-0,85). Conclusión: la baja prevalencia de participación del acompañante de la gestante en el prenatal evidencia la necesidad de un mayor estímulo a su inclusión en este proceso.


ABSTRACT Objective: to identify the prevalence and factors associated with the participation of the pregnant woman's partner in prenatal care. Method: cross-sectional study conducted between March and July 2018 by interviewing 655 puerperal from a regional office in Northeastern Brazil. Associations were estimated using Chi-square and Prevalence Ratio. Results: Among women with a partner who had prenatal care (85.6%; n= 561), the partner's participation was (44.2%; n=248), being higher among those who planned pregnancy (PR: 1.25; 95% CI: 1.07-2.10), desired to become pregnant (PR: 1.22; 95% CI: 1.01-1.98), initiated early follow-up (PR: 1.31; 95% CI: 1.01-2.46), and had six or more consultations (PR: 1.49; 95% CI: 1.32-1.81). There was lower participation among women with low education (PR: 0.72; 95% CI: 0.39-0.77) and who used public services (PR: 0.65; 95% CI: 0.24-0.85). Conclusion: the low prevalence of the pregnant woman's companion participation in prenatal care highlights the need to further encourage their inclusion in this process.

8.
Gigascience ; 9(6)2020 06 01.
Article in English | MEDLINE | ID: mdl-32479592

ABSTRACT

Biomedical research depends increasingly on computational tools, but mechanisms ensuring open data, open software, and reproducibility are variably enforced by academic institutions, funders, and publishers. Publications may present software for which source code or documentation are or become unavailable; this compromises the role of peer review in evaluating technical strength and scientific contribution. Incomplete ancillary information for an academic software package may bias or limit subsequent work. We provide 8 recommendations to improve reproducibility, transparency, and rigor in computational biology-precisely the values that should be emphasized in life science curricula. Our recommendations for improving software availability, usability, and archival stability aim to foster a sustainable data science ecosystem in life science research.


Subject(s)
Biomedical Research/standards , Computational Biology , Data Accuracy , Humans , Reproducibility of Results , Software
9.
Genome Biol ; 21(1): 71, 2020 03 17.
Article in English | MEDLINE | ID: mdl-32183840

ABSTRACT

BACKGROUND: Recent advancements in next-generation sequencing have rapidly improved our ability to study genomic material at an unprecedented scale. Despite substantial improvements in sequencing technologies, errors present in the data still risk confounding downstream analysis and limiting the applicability of sequencing technologies in clinical tools. Computational error correction promises to eliminate sequencing errors, but the relative accuracy of error correction algorithms remains unknown. RESULTS: In this paper, we evaluate the ability of error correction algorithms to fix errors across different types of datasets that contain various levels of heterogeneity. We highlight the advantages and limitations of computational error correction techniques across different domains of biology, including immunogenomics and virology. To demonstrate the efficacy of our technique, we apply the UMI-based high-fidelity sequencing protocol to eliminate sequencing errors from both simulated data and the raw reads. We then perform a realistic evaluation of error-correction methods. CONCLUSIONS: In terms of accuracy, we find that method performance varies substantially across different types of datasets with no single method performing best on all types of examined data. Finally, we also identify the techniques that offer a good balance between precision and sensitivity.


Subject(s)
Algorithms , High-Throughput Nucleotide Sequencing , Benchmarking , Computational Biology/methods , Humans , Receptors, Antigen, T-Cell/genetics , Viruses/genetics , Whole Genome Sequencing
10.
Heliyon ; 6(2): e03342, 2020 Feb.
Article in English | MEDLINE | ID: mdl-32099915

ABSTRACT

Indices improve the performance of relational databases, especially on queries that return a small portion of the data (i.e., low-selectivity queries). Star joins are particularly expensive operations that commonly rely on indices for improved performance at scale. The development and support of index-based solutions for Star Joins are still at very early stages. To address this gap, we propose a distributed Bitmap Join Index (dBJI) and a framework-agnostic strategy to solve join predicates in linear time. For empirical analysis, we used common Hadoop technologies (e.g., HBase and Spark) to show that dBJI significantly outperforms full scan approaches by a factor between 59% and 88% in queries with low selectivity from the Star Schema Benchmark (SSB). Thus, distributed indices may significantly enhance low-selectivity query performance even in very large databases.

11.
Gigascience ; 9(1)2020 01 01.
Article in English | MEDLINE | ID: mdl-31972019

ABSTRACT

BACKGROUND: In today's world of big data, computational analysis has become a key driver of biomedical research. High-performance computational facilities are capable of processing considerable volumes of data, yet often lack an easy-to-use interface to guide the user in supervising and adjusting bioinformatics analysis via a tablet or smartphone. RESULTS: To address this gap we proposed Telescope, a novel tool that interfaces with high-performance computational clusters to deliver an intuitive user interface for controlling and monitoring bioinformatics analyses in real-time. By leveraging last generation technology now ubiquitous to most researchers (such as smartphones), Telescope delivers a friendly user experience and manages conectivity and encryption under the hood. CONCLUSIONS: Telescope helps to mitigate the digital divide between wet and computational laboratories in contemporary biology. By delivering convenience and ease of use through a user experience not relying on expertise with computational clusters, Telescope can help researchers close the feedback loop between bioinformatics and experimental work with minimal impact on the performance of computational tools. Telescope is freely available at https://github.com/Mangul-Lab-USC/telescope.


Subject(s)
Computational Biology/methods , Data Mining/methods , Software , Big Data , User-Computer Interface
12.
PLoS Biol ; 17(6): e3000333, 2019 06.
Article in English | MEDLINE | ID: mdl-31220077

ABSTRACT

Developing new software tools for analysis of large-scale biological data is a key component of advancing modern biomedical research. Scientific reproduction of published findings requires running computational tools on data generated by such studies, yet little attention is presently allocated to the installability and archival stability of computational software tools. Scientific journals require data and code sharing, but none currently require authors to guarantee the continuing functionality of newly published tools. We have estimated the archival stability of computational biology software tools by performing an empirical analysis of the internet presence for 36,702 omics software resources published from 2005 to 2017. We found that almost 28% of all resources are currently not accessible through uniform resource locators (URLs) published in the paper they first appeared in. Among the 98 software tools selected for our installability test, 51% were deemed "easy to install," and 28% of the tools failed to be installed at all because of problems in the implementation. Moreover, for papers introducing new software, we found that the number of citations significantly increased when authors provided an easy installation process. We propose for incorporation into journal policy several practical solutions for increasing the widespread installability and archival stability of published bioinformatics software.


Subject(s)
Computational Biology/methods , Information Dissemination/methods , Information Storage and Retrieval/methods , Biomedical Research , Databases, Factual , Humans , Internet , Software/trends
SELECTION OF CITATIONS
SEARCH DETAIL
...