RESUMO
Data-driven computational analysis is becoming increasingly important in biomedical research, as the amount of data being generated continues to grow. However, the lack of practices of sharing research outputs, such as data, source code and methods, affects transparency and reproducibility of studies, which are critical to the advancement of science. Many published studies are not reproducible due to insufficient documentation, code, and data being shared. We conducted a comprehensive analysis of 453 manuscripts published between 2016-2021 and found that 50.1% of them fail to share the analytical code. Even among those that did disclose their code, a vast majority failed to offer additional research outputs, such as data. Furthermore, only one in ten articles organized their code in a structured and reproducible manner. We discovered a significant association between the presence of code availability statements and increased code availability. Additionally, a greater proportion of studies conducting secondary analyses were inclined to share their code compared to those conducting primary analyses. In light of our findings, we propose raising awareness of code sharing practices and taking immediate steps to enhance code availability to improve reproducibility in biomedical research. By increasing transparency and reproducibility, we can promote scientific rigor, encourage collaboration, and accelerate scientific discoveries. We must prioritize open science practices, including sharing code, data, and other research products, to ensure that biomedical research can be replicated and built upon by others in the scientific community.
RESUMO
Data-driven computational analysis is becoming increasingly important in biomedical research, as the amount of data being generated continues to grow. However, the lack of practices of sharing research outputs, such as data, source code and methods, affects transparency and reproducibility of studies, which are critical to the advancement of science. Many published studies are not reproducible due to insufficient documentation, code, and data being shared. We conducted a comprehensive analysis of 453 manuscripts published between 2016-2021 and found that 50.1% of them fail to share the analytical code. Even among those that did disclose their code, a vast majority failed to offer additional research outputs, such as data. Furthermore, only one in ten papers organized their code in a structured and reproducible manner. We discovered a significant association between the presence of code availability statements and increased code availability (p=2.71×10-9). Additionally, a greater proportion of studies conducting secondary analyses were inclined to share their code compared to those conducting primary analyses (p=1.15*10-07). In light of our findings, we propose raising awareness of code sharing practices and taking immediate steps to enhance code availability to improve reproducibility in biomedical research. By increasing transparency and reproducibility, we can promote scientific rigor, encourage collaboration, and accelerate scientific discoveries. We must prioritize open science practices, including sharing code, data, and other research products, to ensure that biomedical research can be replicated and built upon by others in the scientific community.
RESUMO
The ability to identify and track T-cell receptor (TCR) sequences from patient samples is becoming central to the field of cancer research and immunotherapy. Tracking genetically engineered T cells expressing TCRs that target specific tumor antigens is important to determine the persistence of these cells and quantify tumor responses. The available high-throughput method to profile TCR repertoires is generally referred to as TCR sequencing (TCR-Seq). However, the available TCR-Seq data are limited compared with RNA sequencing (RNA-Seq). In this paper, we have benchmarked the ability of RNA-Seq-based methods to profile TCR repertoires by examining 19 bulk RNA-Seq samples across 4 cancer cohorts including both T-cell-rich and T-cell-poor tissue types. We have performed a comprehensive evaluation of the existing RNA-Seq-based repertoire profiling methods using targeted TCR-Seq as the gold standard. We also highlighted scenarios under which the RNA-Seq approach is suitable and can provide comparable accuracy to the TCR-Seq approach. Our results show that RNA-Seq-based methods are able to effectively capture the clonotypes and estimate the diversity of TCR repertoires, as well as provide relative frequencies of clonotypes in T-cell-rich tissues and low-diversity repertoires. However, RNA-Seq-based TCR profiling methods have limited power in T-cell-poor tissues, especially in highly diverse repertoires of T-cell-poor tissues. The results of our benchmarking provide an additional appealing argument to incorporate RNA-Seq into the immune repertoire screening of cancer patients as it offers broader knowledge into the transcriptomic changes that exceed the limited information provided by TCR-Seq.
Assuntos
Benchmarking , Neoplasias , Humanos , Receptores de Antígenos de Linfócitos T/genética , Linfócitos T , Neoplasias/genética , Análise de Sequência de RNARESUMO
As the outbreak of novel coronavirus disease (COVID-19) continues to spread throughout the world, steps are being taken to limit the impact on public health. In the realm of infectious diseases like COVID-19, social distancing is one of the effective measures to avoid exposure to the virus and reduce its spread. Traveling on public transport can meaningfully facilitate the propagation of the transmission of infectious diseases. Accordingly, responsive actions taken by public transit agencies against risk factors can effectively limit the risk and make transit systems safe. Among the multitude of risk factors that can affect infection spread on public transport, the likelihood of exposure is a major factor that depends on the number of people riding the public transport and can be reduced by socially distanced settings. Considering that many individuals may not act in the socially optimal manner, the necessity of public transit agencies to implement measures and restrictions is vital. In this study, we present a novel web-based application, T-Ridership, based on a hybrid optimized dynamic programming inspired by neural networks algorithm to optimize public transit for safety with respect to COVID-19. Two main steps are taken in the analysis through Metropolitan Transportation Authority (MTA): detecting high-density stations by input data normalization, and then, using these results, the T-Ridership tool automatically determines optimal station order to avoid overcrowded transit vehicles. Effectively our proposed web tool helps public transit to be safe to ride under risk of infections by reducing the density of riders on public transit vehicles as well as trip duration. These results can be used in expanding on and improving policy in public transit, to better plan the scheduled time of trains and buses in a way that prevents high-volume human contact, increases social distance, and reduces the possibility of disease transmission (available at:http://t-ridership.com and GitHub at: https://github.com/Imani-Saba/TRidership).
RESUMO
T cell receptor (TCR) studies have grown substantially with the advancement in the sequencing techniques of T cell receptor repertoire sequencing (TCR-Seq). The analysis of the TCR-Seq data requires computational skills to run the computational analysis of TCR repertoire tools. However biomedical researchers with limited computational backgrounds face numerous obstacles to properly and efficiently utilizing bioinformatics tools for analyzing TCR-Seq data. Here we report pyTCR, a computational notebook-based solution for comprehensive and scalable TCR-Seq data analysis. Computational notebooks, which combine code, calculations, and visualization, are able to provide users with a high level of flexibility and transparency for the analysis. Additionally, computational notebooks are demonstrated to be user-friendly and suitable for researchers with limited computational skills. Our tool has a rich set of functionalities including various TCR metrics, statistical analysis, and customizable visualizations. The application of pyTCR on large and diverse TCR-Seq datasets will enable the effective analysis of large-scale TCR-Seq data with flexibility, and eventually facilitate new discoveries.
Assuntos
Análise de Dados , Receptores de Antígenos de Linfócitos T , Reprodutibilidade dos Testes , Receptores de Antígenos de Linfócitos T/genética , Benchmarking , Biologia ComputacionalRESUMO
Motif discovery and characterization are important for gene regulation analysis. The lack of intuitive and integrative web servers impedes the effective use of motifs. Most motif discovery web tools are either not designed for non-expert users or lacking optimization steps when using default settings. Here we describe bipartite motifs learning (BML), a parameter-free web server that provides a user-friendly portal for online discovery and analysis of sequence motifs, using high-throughput sequencing data as the input. BML utilizes both position weight matrix and dinucleotide weight matrix, the latter of which enables the expression of the interdependencies of neighboring bases. With input parameters concerning the motifs are given, the BML achieves significantly higher accuracy than other available tools for motif finding. When no parameters are given by non-expert users, unlike other tools, BML employs a learning method to identify motifs automatically and achieve accuracy comparable to the scenario where the parameters are set. The BML web server is freely available at http://motif.t-ridership.com/ (https://github.com/Mohammad-Vahed/BML).
Assuntos
Motivos de Nucleotídeos , Software , Fatores de Transcrição/metabolismo , Navegador , Algoritmos , Arabidopsis , Sítios de Ligação , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Matrizes de Pontuação de Posição Específica , Análise de Sequência de DNARESUMO
The early stage of secondary structural conversion of amyloid beta (Aß) to misfolded aggregations is a key feature of Alzheimer's disease (AD). Under normal physiological conditions, Aß peptides can protect neurons from the toxicity of highly concentrated metals. However, they become toxic under certain conditions. Under conditions of excess iron, amyloid precursor proteins (APP) become overexpressed. This subsequently increases Aß production. Experimental studies suggest that Aß fibrillation (main-pathway) and amorphous (off-pathway) aggregate formations are two competitive pathways driven by factors such as metal binding, pH and temperature. In this study, we performed molecular dynamic (MD) simulations to examine the initial stage of conformational transformations of human Aß (hAß) and rat Aß (rAß) peptides in the presence of Fe2+ and Fe3+ ions. Our results demonstrated that Fe2+ and Fe3+ play key roles in Aßs folding and aggregation. Fe3+ had a greater effect than Fe2+on Aßs' folding during intermolecular interactions and subsequently, had a greater effect in decreasing structural diversity. Fe2+ was observed to be more likely than Fe3+ to interact with nitrogen atoms from the residues of imidazole rings of His. rAß peptides are more energetically favorable than hAß for intermolecular interactions and amorphous aggregations. We concluded that most hAß structures were energetically unfavorable. However, hAßs with intermolecular ß-sheet formations in the C-terminal were energetically favorable. It is notable that Fe2+ can change the surface charge of hAß. Furthermore, Fe3+ can promote C-terminal folding by binding to Glu22 and Ala42, and by forming stable ß-sheet formations on the C-terminal. Fe3+ can also pause the main-pathway by inducing random aggregations.
Assuntos
Doença de Alzheimer/metabolismo , Peptídeos beta-Amiloides/metabolismo , Compostos Férricos/metabolismo , Compostos Ferrosos/metabolismo , Simulação de Dinâmica Molecular , Peptídeos beta-Amiloides/química , Animais , Compostos Férricos/química , Compostos Ferrosos/química , Humanos , Agregados Proteicos , Conformação Proteica , Dobramento de Proteína , RatosRESUMO
It is extremely important to identify transcription factor binding sites (TFBSs). Some TFBSs are proposed to be bipartite motifs known as two-block motifs separated by gap sequences with variable lengths. While position weight matrix (PWM) is commonly used for the representation and prediction of TFBSs, dinucleotide weight matrix (DWM) enables expression of the interdependencies of neighboring bases. By incorporating DWM into the detection of bipartite motifs, we have developed a novel tool for ab initio motif detection, DIpartite (bipartite motif detection tool based on dinucleotide weight matrix) using a Gibbs sampling strategy and the minimization of Shannon's entropy. DIpartite predicts the bipartite motifs by considering the interdependencies of neighboring positions, that is, DWM. We compared DIpartite with other available alternatives by using test datasets, namely, of CRP in E. coli, sigma factors in B. subtilis, and promoter sequences in humans. We have developed DIpartite for the detection of TFBSs, particularly bipartite motifs. DIpartite enables ab initio prediction of conserved motifs based on not only PWM, but also DWM. We evaluated the performance of DIpartite by comparing it with freely available tools, such as MEME, BioProspector, BiPad, and AMD. Taken the obtained findings together, DIpartite performs equivalently to or better than these other tools, especially for detecting bipartite motifs with variable gaps. DIpartite requires users to specify the motif lengths, gap length, and PWM or DWM. DIpartite is available for use at https://github.com/Mohammad-Vahed/DIpartite.
Assuntos
Biologia Computacional/métodos , Motivos de Nucleotídeos , Pareamento de Bases , Clostridium/genética , Proteína Receptora de AMP Cíclico/genética , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Humanos , Matrizes de Pontuação de Posição Específica , Regiões Promotoras Genéticas/genética , Fator sigma/genéticaRESUMO
PREMISE: In crop breeding programs, breeders use yield performance in both optimal and stressful environments as a key indicator for screening the most tolerant genotypes. During the past four decades, several yield-based indices have been suggested for evaluating stress tolerance in crops. Despite the well-established use of these indices in agronomy and plant breeding, a user-friendly software that would provide access to these methods is still lacking. METHODS AND RESULTS: The Plant Abiotic Stress Index Calculator (iPASTIC) is an online program based on JavaScript and R that calculates common stress tolerance and susceptibility indices for various crop traits including the tolerance index (TOL), relative stress index (RSI), mean productivity (MP), harmonic mean (HM), yield stability index (YSI), geometric mean productivity (GMP), stress susceptibility index (SSI), stress tolerance index (STI), and yield index (YI). Along with these indices, this easily accessible tool can also calculate their ranking patterns, estimate the relative frequency for each index, and create heat maps based on Pearson's and Spearman's rank-order correlation analyses. In addition, it can also render three-dimensional plots based on both yield performances and each index to separate entry genotypes into Fernandez's groups (A, B, C, and D), and perform principal component analysis. The accuracy of the results calculated from our software was tested using two different data sets obtained from previous experiments testing the salinity and drought stress in wheat genotypes, respectively. CONCLUSIONS: iPASTIC can be widely used in agronomy and plant breeding programs as a user-friendly interface for agronomists and breeders dealing with large volumes of data. The software is available at https://mohsenyousefian.com/ipastic/.
RESUMO
The alarm is rang for friendly fire; Saccharomyces cerevisiae (S. cerevisiae) newfound as a fungal pathogen with an individual feature. S. cerevisiae has food safety and is not capable of producing infection but, when the host defenses are weakened, there is room for opportunistic S. cerevisiae strains to cause a health issues. Fungal diseases are challenging to treat because, unlike bacteria, the fungal are eukaryotes. Antibiotics only target prokaryotic cells, whereas compounds that kill fungi also harm the mammalian host. Small differences between mammalian and fungal cells regarding genes and proteins sequence and function make finding a drug target more challenging. Recently, Chitin synthase has been considered as a promising target for antifungal drug development as it is absent in mammals. In S. cerevisiae, CHS3, a class IV chitin synthase, produces 90% of the chitin and essential for cell growth. CHS3 from the trans-Golgi network to the plasma membrane requires assembly of the exomer complex (including proteins cargo such as CHS5, CHS6, Bach1, and Arf1). In this work, we performed SELEX (Systematic Evolution of Ligands by EXponential enrichment) as high throughput virtual screening of the RCSB data bank to find an aptamer as potential inhibit of the class IV chitin synthase of S. cerevisiae. Among all the candidates, G-rich VEGF (GVEGF) aptamer (PDB code: 2M53) containing locked sugar parts was observed as potential inhibitor of the assembly of CHS5-CHS6 exomer complex a subsequently block the chitin biosynthesis pathway as an effective anti-fungal. It was suggested from the simulation that an assembly of exomer core should begin CHS5-CHS6, not from CHS5-Bach1. It is notable that secondary structures of CHS6 and Bach1 was observed very similar, but they have only 25% identity at the amino acid sequence that exhibited different features in exomer assembly.
Assuntos
Proteínas Adaptadoras de Transporte Vesicular/metabolismo , Aptâmeros de Nucleotídeos/metabolismo , Quitina Sintase/metabolismo , Peptídeos e Proteínas de Sinalização Intracelular/metabolismo , Proteínas de Membrana/metabolismo , Multimerização Proteica/efeitos dos fármacos , Proteínas de Saccharomyces cerevisiae/metabolismo , Fator A de Crescimento do Endotélio Vascular/química , Proteínas Adaptadoras de Transporte Vesicular/química , Sequência de Aminoácidos , Antifúngicos/metabolismo , Aptâmeros de Nucleotídeos/genética , Sítios de Ligação , Quitina Sintase/química , Quadruplex G , Peptídeos e Proteínas de Sinalização Intracelular/química , Proteínas de Membrana/química , Simulação de Acoplamento Molecular , Ligação Proteica , Técnica de Seleção de Aptâmeros , Saccharomyces cerevisiae/enzimologia , Proteínas de Saccharomyces cerevisiae/química , Alinhamento de SequênciaRESUMO
Mites of the genus Neotarsonemoides Kaliszewski, 1984 (Acariformes: Tarsonemoidea: Tarsonemidae) were collected in the East Azerbaijan province, Northwestern Iran. Neotarsonemoides (N.) marandicus sp. nov. is described and illustrated. Other species collected include: Neotarsonemoides (N.) evae Magowski, 2002 , N. (N.) multiplex (Kaliszewski, 1983) and N. (N.) occultus (Kaliszewski, 1983) which represent the first records of these species outside of their type locality. Neotarsonemoides (N.) polonicus (Willmann, 1949) and N. (Ototarsonemus) alatus (Livshits, Mitrofanov and Sharonov,1979) are new records for the fauna of Asia, as well as the first record of the subgenus Ototarsonemus in Western Asia. A re-description and illustrations of N. (O.) alatus are provided. An identification key to females of the genus Neotarsonemoides in Iran is provided.