Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 121.891
Filtrar
Mais filtros








Intervalo de ano de publicação
1.
Braz. j. biol ; 84: e245592, 2024. tab, graf
Artigo em Inglês | MEDLINE, LILACS, VETINDEX | ID: biblio-1355866

RESUMO

Abstract In recent years, the development of high-throughput technologies for obtaining sequence data leveraged the possibility of analysis of protein data in silico. However, when it comes to viral polyprotein interaction studies, there is a gap in the representation of those proteins, given their size and length. The prepare for studies using state-of-the-art techniques such as Machine Learning, a good representation of such proteins is a must. We present an alternative to this problem, implementing a fragmentation and modeling protocol to prepare those polyproteins in the form of peptide fragments. Such procedure is made by several scripts, implemented together on the workflow we call PolyPRep, a tool written in Python script and available in GitHub. This software is freely available only for noncommercial users.


Resumo Nos últimos anos, o desenvolvimento de tecnologias de alto rendimento para obtenção de dados sequenciais potencializou a possibilidade de análise de dados proteicos in silico. No entanto, quando se trata de estudos de interação de poliproteínas virais, existe uma lacuna na representação dessas proteínas, devido ao seu tamanho e comprimento. Para estudos utilizando técnicas de ponta como o Aprendizado de Máquina, uma boa representação dessas proteínas é imprescindível. Apresentamos uma alternativa para este problema, implementando um protocolo de fragmentação e modelagem para preparar essas poliproteínas na forma de fragmentos de peptídeos. Tal procedimento é feito por diversos scripts, implementados em conjunto no workflow que chamamos de PolyPRep, uma ferramenta escrita em script Python e disponível no GitHub. Este software está disponível gratuitamente apenas para usuários não comerciais.


Assuntos
Protease de HIV , Poliproteínas , Software , Simulação de Acoplamento Molecular
2.
Braz. j. oral sci ; 21: e227903, jan.-dez. 2022. ilus
Artigo em Inglês | LILACS, BBO - Odontologia | ID: biblio-1355005

RESUMO

Aim: To evaluate the accuracy and the validity of orthodontic diagnostic measurements, as well as virtual tooth transformations using a generic open access 3D software compared to OrthoAnalyzer (3Shape) software; which was previously tested and proven for accuracy. Methods: 40 maxillary and mandibular single arch study models were duplicated and scanned using 3Shape laser scanner. The files were imported into the generic and OrthoAnalyzer software programs; where linear measurements were taken twice to investigate the accuracy of the program. To test the accuracy of the program format, they were printed, rescanned and imported into OrthAnalyzer. Finally, to investigate the accuracy of editing capabilities, linear and angular transformation procedures were performed, superimposed and printed to be rescanned and imported to OrthoAnalyzer for comparison. Results: There was no statistically significant difference between the two groups using the two software programs regarding the accuracy of the linear measurements (p>0.05). There was no statistically significant difference between the different formats among all the measurements, (p>0.05). The editing capabilities also showed no statistically significant difference (p>0.05). Conclusion: The generic 3D software (Meshmixer) was valid and accurate in cast measurements and linear and angular editing procedures. It can be used for orthodontic diagnosis and treatment planning without added costs


Assuntos
Software , Moldes Cirúrgicos , Imageamento Tridimensional , Modelos Dentários
3.
Syst Rev ; 11(1): 113, 2022 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-35659294

RESUMO

Rigorous evidence is vital in all disciplines to ensure efficient, appropriate, and fit-for-purpose decision-making with minimised risk of unintended harm. To date, however, disciplines have been slow to share evidence synthesis frameworks, best practices, and tools amongst one another. Recent progress in collaborative digital and programmatic frameworks, such as the free and Open Source software R, have significantly expanded the opportunities for development of free-to-use, incrementally improvable, community driven tools to support evidence synthesis (e.g. EviAtlas, robvis, PRISMA2020 flow diagrams and metadat). Despite this, evidence synthesis (and meta-analysis) practitioners and methodologists who make use of R remain relatively disconnected from one another. Here, we report on a new virtual conference for evidence synthesis and meta-analysis in the R programming environment (ESMARConf) that aims to connect these communities. By designing an entirely free and online conference from scratch, we have been able to focus efforts on maximising accessibility and equity-making these core missions for our new community of practice. As a community of practice, ESMARConf builds on the success and groundwork of the broader R community and systematic review coordinating bodies (e.g. Cochrane), but fills an important niche. ESMARConf aims to maximise accessibility and equity of participants across regions, contexts, and social backgrounds, forging a level playing field in a digital, connected, and online future of evidence synthesis. We believe that everyone should have the same access to participation and involvement, and we believe ESMARConf provides a vital opportunity to push for equitability across disciplines, regions, and personal situations.


Assuntos
Software , Humanos
4.
PLoS One ; 17(6): e0268401, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35709137

RESUMO

The study of artifacts is fundamental to archaeological research. The features of individual artifacts are recorded, analyzed, and compared within and between contextual assemblages. Here we present and make available for academic-use Artifact3-D, a new software package comprised of a suite of analysis and documentation procedures for archaeological artifacts. We introduce it here, alongside real archaeological case studies to demonstrate its utility. Artifact3-D equips its users with a range of computational functions for accurate measurements, including orthogonal distances, surface area, volume, CoM, edge angles, asymmetry, and scar attributes. Metrics and figures for each of these measurements are easily exported for the purposes of further analysis and illustration. We test these functions on a range of real archaeological case studies pertaining to tool functionality, technological organization, manufacturing traditions, knapping techniques, and knapper skill. Here we focus on lithic artifacts, but the Artifact3-D software can be used on any artifact type to address the needs of modern archaeology. Computational methods are increasingly becoming entwined in the excavation, documentation, analysis, database creation, and publication of archaeological research. Artifact3-D offers functions to address every stage of this workflow. It equips the user with the requisite toolkit for archaeological research that is accurate, objective, repeatable and efficient. This program will help archaeological research deal with the abundant material found during excavations and will open new horizons in research trajectories.


Assuntos
Arqueologia , Software , Arqueologia/métodos , Artefatos , Documentação , Tecnologia
5.
PLoS Comput Biol ; 18(6): e1009783, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35653385

RESUMO

Computational methods play a pivotal role in drug discovery and are widely applied in virtual screening, structure optimization, and compound activity profiling. Over the last decades, almost all the attention in medicinal chemistry has been directed to protein-ligand binding, and computational tools have been created with this target in mind. With novel discoveries of functional RNAs and their possible applications, RNAs have gained considerable attention as potential drug targets. However, the availability of bioinformatics tools for nucleic acids is limited. Here, we introduce fingeRNAt-a software tool for detecting non-covalent interactions formed in complexes of nucleic acids with ligands. The program detects nine types of interactions: (i) hydrogen and (ii) halogen bonds, (iii) cation-anion, (iv) pi-cation, (v) pi-anion, (vi) pi-stacking, (vii) inorganic ion-mediated, (viii) water-mediated, and (ix) lipophilic interactions. However, the scope of detected interactions can be easily expanded using a simple plugin system. In addition, detected interactions can be visualized using the associated PyMOL plugin, which facilitates the analysis of medium-throughput molecular complexes. Interactions are also encoded and stored as a bioinformatics-friendly Structural Interaction Fingerprint (SIFt)-a binary string where the respective bit in the fingerprint is set to 1 if a particular interaction is present and to 0 otherwise. This output format, in turn, enables high-throughput analysis of interaction data using data analysis techniques. We present applications of fingeRNAt-generated interaction fingerprints for visual and computational analysis of RNA-ligand complexes, including analysis of interactions formed in experimentally determined RNA-small molecule ligand complexes deposited in the Protein Data Bank. We propose interaction fingerprint-based similarity as an alternative measure to RMSD to recapitulate complexes with similar interactions but different folding. We present an application of interaction fingerprints for the clustering of molecular complexes. This approach can be used to group ligands that form similar binding networks and thus have similar biological properties. The fingeRNAt software is freely available at https://github.com/n-szulc/fingeRNAt.


Assuntos
Ácidos Nucleicos , Ligantes , Ligação Proteica , Proteínas/química , RNA , Software
6.
PLoS Comput Biol ; 18(6): e1010097, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35658001

RESUMO

Single-cell RNA sequencing (scRNA-seq) has emerged as a powerful technique to decipher tissue composition at the single-cell level and to inform on disease mechanisms, tumor heterogeneity, and the state of the immune microenvironment. Although multiple methods for the computational analysis of scRNA-seq data exist, their application in a clinical setting demands standardized and reproducible workflows, targeted to extract, condense, and display the clinically relevant information. To this end, we designed scAmpi (Single Cell Analysis mRNA pipeline), a workflow that facilitates scRNA-seq analysis from raw read processing to informing on sample composition, clinically relevant gene and pathway alterations, and in silico identification of personalized candidate drug treatments. We demonstrate the value of this workflow for clinical decision making in a molecular tumor board as part of a clinical study.


Assuntos
Análise de Célula Única , Software , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Sequenciamento Completo do Exoma , Fluxo de Trabalho
7.
Phys Med ; 99: 73-84, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35660792

RESUMO

The aim of this study is to compare effective dose (E) estimations based on different methods for patients with recurrent computed tomography (CT) examinations. Seventeen methods were used to determine the E of each phase as well as the total E of the CT examination. These included three groups of estimations: based on the use of published E, calculated from typical or patient-specific values of volume computed tomography dose index (CTDIvol) and dose-length product (DLP) multiplied by conversion coefficients, and based on patient-specific calculations with use of software. The E from a single phase of the examination varied with a ratio from 1.3 to 6.8 for small size patients, from 1.2 to 6.5 for normal size patients, and from 1.7 up to 18.1 for large size patients, depending on the calculation method used. The cumulative effective dose (CED) ratio per patient for the different size groups varied as follows: from 1.4 to 2.5 (small), from 1.7 to 4.3 (normal), and from 2.2 up to 6.3 (large). The minimum CED across patients varied from 38 up to 200 mSv, while the variation of maximum CED was from 122 up to 538 mSv. Although E is recommended for population estimations, it is sometimes needed and used for individual patients in clinical practice. Its value is highly dependent on the method applied. Individual estimations of E can vary up to 18.1 times and CED estimations can differ up to 6 times. The related large uncertainties should always be taken into account.


Assuntos
Software , Tomografia Computadorizada por Raios X , Humanos , Doses de Radiação , Tomografia Computadorizada por Raios X/métodos
8.
J Pharm Biomed Anal ; 218: 114854, 2022 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-35660874

RESUMO

Volatile oil, as an important bioactive fraction of medicinal herbs, is comprised of a diversity of compounds. At present, gas chromatography-mass spectrometry (GC-MS) is one of the mainstream approaches to profiling these complex components. However, GC-MS faces the major bottleneck in data analysis, such as co-elution of more than one compound, and interference caused by high background noise; this usually makes an operator have to spend a lot of time and effort in optimizing experimental conditions. Taking Chuanxiong Rhizoma (the dry rhizome of Ligusticum chuanxiong Hort., abbreviated as "CR") as an example, this study is intended to provide a feasible, quick and cost-effective solution for compound identification based on the chemometric method of entropy minimization (EM) algorithm. Ten batches of geo-authentic CR and eight batches of adulterants including Fuxiong (FX), Shanchuanxiong (SCX) and Cnidii Rhizoma (CNR) were determined by headspace GC-MS. FX and SCX were rhizomes of L. chuanxiong but subjected to improper harvest time. CNR was the dried rhizome of Cnidium officinale Makino. The co-eluting and overlapping peaks and low-concentration peaks with high background were precisely reconstructed by EM algorithm, and then the reconstructed pure mass spectra of each component were compared with the ion fragment information in NIST library for qualitative identification. EM algorithm proves to be capable of delivering results with increased accuracy and high confidence. Moreover, by the GC-MS approach established in this work, the volatile chemical profiles of FX, SCX, and CNR, were quite distinct from those of geo-authentic CR, suggesting that the adulterants should not be confused with CR in clinical practice and pharmaceutical industry. In brief, the advanced EM algorithm is envisioned to be applied to a variety of medicinal herbs, enabling rapid and accurate identification of volatile phytochemicals.


Assuntos
Medicamentos de Ervas Chinesas , Ligusticum , Plantas Medicinais , Medicamentos de Ervas Chinesas/análise , Entropia , Cromatografia Gasosa-Espectrometria de Massas , Ligusticum/química , Rizoma/química , Software
9.
Mol Biol Evol ; 39(6)2022 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-35647675

RESUMO

Commonly used methods for inferring phylogenies were designed before the emergence of high-throughput sequencing and can generally not accommodate the challenges associated with noisy, diploid sequencing data. In many applications, diploid genomes are still treated as haploid through the use of ambiguity characters; while the uncertainty in genotype calling-arising as a consequence of the sequencing technology-is ignored. In order to address this problem, we describe two new probabilistic approaches for estimating genetic distances: distAngsd-geno and distAngsd-nuc, both implemented in a software suite named distAngsd. These methods are specifically designed for next-generation sequencing data, utilize the full information from the data, and take uncertainty in genotype calling into account. Through extensive simulations, we show that these new methods are markedly more accurate and have more stable statistical behaviors than other currently available methods for estimating genetic distances-even for very low depth data with high error rates.


Assuntos
Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Diploide , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Software
10.
Nucleic Acids Res ; 50(11): 6067-6083, 2022 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-35657102

RESUMO

Box C/D small nucleolar RNAs (snoRNAs) are a conserved class of RNA known for their role in guiding ribosomal RNA 2'-O-ribose methylation. Recently, C/D snoRNAs were also implicated in regulating the expression of non-ribosomal genes through different modes of binding. Large scale RNA-RNA interaction datasets detect many snoRNAs binding messenger RNA, but are limited by specific experimental conditions. To enable a more comprehensive study of C/D snoRNA interactions, we created snoGloBe, a human C/D snoRNA interaction predictor based on a gradient boosting classifier. SnoGloBe considers the target type, position and sequence of the interactions, enabling it to outperform existing predictors. Interestingly, for specific snoRNAs, snoGloBe identifies strong enrichment of interactions near gene expression regulatory elements including splice sites. Abundance and splicing of predicted targets were altered upon the knockdown of their associated snoRNA. Strikingly, the predicted snoRNA interactions often overlap with the binding sites of functionally related RNA binding proteins, reinforcing their role in gene expression regulation. SnoGloBe is also an excellent tool for discovering viral RNA targets, as shown by its capacity to identify snoRNAs targeting the heavily methylated SARS-CoV-2 RNA. Overall, snoGloBe is capable of identifying experimentally validated binding sites and predicting novel sites with shared regulatory function.


Assuntos
RNA Nucleolar Pequeno , Software , Sequência de Bases , Humanos , RNA Ribossômico/metabolismo , RNA Nucleolar Pequeno/metabolismo , RNA Viral , SARS-CoV-2
11.
Biomolecules ; 12(6)2022 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-35740898

RESUMO

Single-particle electron cryomicroscopy (cryoEM) has become an indispensable tool for studying structure and function in macromolecular assemblies. As an integral part of the cryoEM structure determination process, computational tools have been developed to build atomic models directly from a density map without structural templates. Nearly a decade ago, we created Pathwalking, a tool for de novo modeling of protein structure in near-atomic resolution cryoEM density maps. Here, we present the latest developments in Pathwalking, including the addition of probabilistic models, as well as a companion tool for modeling waters and ligands. This software was evaluated on the 2021 CryoEM Ligand Challenge density maps, in addition to identifying ligands in three IP3R1 density maps at ~3 Å to 4.1 Å resolution. The results clearly demonstrate that the Pathwalking de novo modeling pipeline can construct accurate protein structures and reliably localize and identify ligand density directly from a near-atomic resolution map.


Assuntos
Proteínas , Software , Microscopia Crioeletrônica/métodos , Ligantes , Modelos Moleculares , Conformação Proteica , Proteínas/química
12.
BMC Bioinformatics ; 23(1): 254, 2022 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-35751014

RESUMO

BACKGROUND: Estimating relatedness is an important step for many genetic study designs. A variety of methods for estimating coefficients of pairwise relatedness from genotype data have been proposed. Both the kinship coefficient [Formula: see text] and the fraternity coefficient [Formula: see text] for all pairs of individuals are of interest. However, when dealing with low-depth sequencing or imputation data, individual level genotypes cannot be confidently called. To ignore such uncertainty is known to result in biased estimates. Accordingly, methods have recently been developed to estimate kinship from uncertain genotypes. RESULTS: We present new method-of-moment estimators of both the coefficients [Formula: see text] and [Formula: see text] calculated directly from genotype likelihoods. We have simulated low-depth genetic data for a sample of individuals with extensive relatedness by using the complex pedigree of the known genetic isolates of Cilento in South Italy. Through this simulation, we explore the behaviour of our estimators, demonstrate their properties, and show advantages over alternative methods. A demonstration of our method is given for a sample of 150 French individuals with down-sampled sequencing data. CONCLUSIONS: We find that our method can provide accurate relatedness estimates whilst holding advantages over existing methods in terms of robustness, independence from external software, and required computation time. The method presented in this paper is referred to as LowKi (Low-depth Kinship) and has been made available in an R package ( https://github.com/genostats/LowKi ).


Assuntos
Modelos Genéticos , Software , Simulação por Computador , Genótipo , Humanos , Linhagem , Sequenciamento Completo do Genoma
13.
BMC Bioinformatics ; 23(1): 250, 2022 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-35751026

RESUMO

BACKGROUND: Alternative splicing can increase the diversity of gene functions by generating multiple isoforms with different sequences and functions. However, the extent to which splicing events have functional consequences remains unclear and predicting the impact of splicing events on protein activity is limited to gene-specific analysis. RESULTS: To accelerate the identification of functionally relevant alternative splicing events we created SAPFIR, a predictor of protein features associated with alternative splicing events. This webserver tool uses InterProScan to predict protein features such as functional domains, motifs and sites in the human and mouse genomes and link them to alternative splicing events. Alternative protein features are displayed as functions of the transcripts and splice sites. SAPFIR could be used to analyze proteins generated from a single gene or a group of genes and can directly identify alternative protein features in large sequence data sets. The accuracy and utility of SAPFIR was validated by its ability to rediscover previously validated alternative protein domains. In addition, our de novo analysis of public datasets using SAPFIR indicated that only a small portion of alternative protein domains was conserved between human and mouse, and that in human, genes involved in nervous system process, regulation of DNA-templated transcription and aging are more likely to produce isoforms missing functional domains due to alternative splicing. CONCLUSION: Overall SAPFIR represents a new tool for the rapid identification of functional alternative splicing events and enables the identification of cellular functions affected by a defined splicing program. SAPFIR is freely available at https://bioinfo-scottgroup.med.usherbrooke.ca/sapfir/ , a website implemented in Python, with all major browsers supported. The source code is available at https://github.com/DelongZHOU/SAPFIR .


Assuntos
Processamento Alternativo , Splicing de RNA , Animais , Genoma , Camundongos , Isoformas de Proteínas/genética , Software
14.
Comput Biol Med ; 146: 105658, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35751187

RESUMO

BACKGROUND: Single-cell RNA-sequencing enables the opportunity to investigate cell heterogeneity, discover new types of cells and to perform transcriptomic reconstruction at a single-cell resolution. Due to technical inadequacy, the presence of dropout events hinders the downstream and differential expression analysis. Therefore, it demands an efficient and accurate approach to recover the true gene expression. To fill the gap, we present a novel Single-cell RNA dropout imputation method to retrieve the original gene expression of the genes with excessive zero and near-zero counts. RESULT: Here we have developed CDSImpute (Correlation Distance Similarity Imputation) to identify dropouts induced in scRNA-seq data rather than biological zeros and recover true gene expression. By taking into consideration correlation and negative distance between cells, a similar cell list has been created and by borrowing the gene expression from similar cells dropout has been detected and corrected simultaneously. The improvement is consistent with simulation data and several publicly available scRNA-seq datasets. The clustering accuracy of CDSImpute is evaluated by adjusted rand index on Kolod, Pollen and Usoskin datasets are 1.00, 0.79 and 0.34 respectively. CDSImpute achieves improved performance compared to the three existing methods evaluated by precise cell-type identification and differentially expressed gene detection from scRNA-seq Data. CONCLUSION: CDSImpute is a novel effective method to impute the dropout events of a scRNA-seq expression matrix. The package is implemented in the R language and is available at https://github.com/riasatazim/CDSImpute.


Assuntos
Perfilação da Expressão Gênica , Análise de Célula Única , Sequência de Bases , Análise por Conglomerados , RNA/genética , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software
15.
Comput Biol Med ; 146: 105538, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35751192

RESUMO

PURPOSE: To explore the application of computer-aided detection (CAD) software on automatically detecting nodules under standard-dose CT (SDCT) and low-dose CT (LDCT) scans with different parameters including definition modes and blending levels of adaptive statistical iterative reconstruction (ASIR), whose influence was important to optimize radiology workflow serving for clinical work. MATERIALS AND METHODS: 117 patients underwent SDCT and LDCT scans. The comprehensive performance of CAD in detect pulmonary nodules including under different ASIR blending levels (0%, 60%, and 80%) and high-definition (HD) or non-HD modes were assessed. The true positive (TP) rate, false positive (FP) rate and the sensitivity were recorded. RESULTS: The stand-alone sensitivity of CAD system was 78.03% (515/660) in SDCT images and 70.15% (456/650) on LDCT images (p < 0.05). The sensitivity of CAD system to pulmonary nodules under non-HD mode was higher than that under HD mode. The detectability of nodules in images reconstructed with 60% and 80% ASIR was found significantly superior to that with 0% ASIR (p < 0.001). The overall sensitivity of CAD system on LDCT images reconstructed with 60% ASIR under HD mode was greater than that with 0% ASIR (p < 0.05), but lower than that with 80% ASIR. However, under non-HD mode, CAD demonstrated a comparable performance on LDCT images reconstructed with 60% ASIR to those reconstructed with 80% ASIR. CONCLUSION: Using the CAD system to detect pulmonary nodules on LDCT images with appropriate levels of ASIR could maintain high diagnostic sensitivity while reducing the radiation dose, which is useful to optimize the radiology workflow.


Assuntos
Interpretação de Imagem Radiográfica Assistida por Computador , Tomografia Computadorizada por Raios X , Algoritmos , Humanos , Doses de Radiação , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Cintilografia , Software , Tomografia Computadorizada por Raios X/métodos
16.
Nat Commun ; 13(1): 3555, 2022 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-35729113

RESUMO

Mechanistic models of how single cells respond to different perturbations can help integrate disparate big data sets or predict response to varied drug combinations. However, the construction and simulation of such models have proved challenging. Here, we developed a python-based model creation and simulation pipeline that converts a few structured text files into an SBML standard and is high-performance- and cloud-computing ready. We applied this pipeline to our large-scale, mechanistic pan-cancer signaling model (named SPARCED) and demonstrate it by adding an IFNγ pathway submodel. We then investigated whether a putative crosstalk mechanism could be consistent with experimental observations from the LINCS MCF10A Data Cube that IFNγ acts as an anti-proliferative factor. The analyses suggested this observation can be explained by IFNγ-induced SOCS1 sequestering activated EGF receptors. This work forms a foundational recipe for increased mechanistic model-based data integration on a single-cell level, an important building block for clinically-predictive mechanistic models.


Assuntos
Computação em Nuvem , Software , Proliferação de Células , Simulação por Computador , Transdução de Sinais
17.
Database (Oxford) ; 20222022 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-35657113

RESUMO

The Gene Expression Omnibus (GEO) is a public archive containing >4 million digital samples from functional genomics experiments collected over almost two decades. The accompanying metadata describing the experiments suffer from redundancy, inconsistency and incompleteness due to the prevalence of free text and the lack of well-defined data formats and their validation. To remedy this situation, we created Genomic Metadata Integration (GeMI; http://gmql.eu/gemi/), a web application that learns to automatically extract structured metadata (in the form of key-value pairs) from the plain text descriptions of GEO experiments. The extracted information can then be indexed for structured search and used for various downstream data mining activities. GeMI works in continuous interaction with its users. The natural language processing transformer-based model at the core of our system is a fine-tuned version of the Generative Pre-trained Transformer 2 (GPT2) model that is able to learn continuously from the feedback of the users thanks to an active learning framework designed for the purpose. As a part of such a framework, a machine learning interpretation mechanism (that exploits saliency maps) allows the users to understand easily and quickly whether the predictions of the model are correct and improves the overall usability. GeMI's ability to extract attributes not explicitly mentioned (such as sex, tissue type, cell type, ethnicity and disease) allows researchers to perform specific queries and classification of experiments, which was previously possible only after spending time and resources with tedious manual annotation. The usefulness of GeMI is demonstrated on practical research use cases. Database URL http://gmql.eu/gemi/.


Assuntos
Genômica , Metadados , Mineração de Dados , Aprendizado de Máquina , Software
18.
J Mol Biol ; 434(11): 167452, 2022 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-35662453

RESUMO

3D structures of RNAs are the basis for understanding their biological functions. However, experimentally solved RNA 3D structures are very limited. Therefore, many computational methods have been proposed to solve this problem, including our 3dRNA. 3dRNA is an automated template-based method of building RNA 3D structures from sequences and secondary structures by using the smallest secondary elements (SSEs) (http://biophy.hust.edu.cn/new/3dRNA). The first version of 3dRNA simply predicts an assembled structure for a target RNA. Later, it is improved to generate a set of assembled models and a method to further optimize them using experimental or theoretical restraints. In particular, pseudoknot base pairings are treated as restraints to solve the problem of no 3D templates for pseudoknots. Here 3dRNA is further extended to predict the 3D structures of circular RNAs since thousands of circular RNAs have been found recently but no 3D structures of them have been determined up to now. We show that circular RNAs can be divided into four types and two types show similar 3D structures with their linear counterparts while two types very different. We also show that the predicted structures of circular RNAs can bind to their ligands more stable than those of their linear counterparts, consistent with experimental results.


Assuntos
Imageamento Tridimensional , RNA Circular , Software , Algoritmos , Modelos Moleculares , Conformação de Ácido Nucleico , RNA Circular/química
19.
J Mol Biol ; 434(11): 167532, 2022 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-35662455

RESUMO

Tissue contexts are extremely valuable when studying protein functions and their associated phenotypes. Recently, the study of proteins in tissue contexts was greatly facilitated by the availability of thousands of tissue transcriptomes. To provide access to these data we developed the TissueNet integrative database that displays protein-protein interactions (PPIs) in tissue contexts. Through TissueNet, users can create tissue-sensitive network views of the PPI landscape of query proteins. Unlike other tools, TissueNet output networks highlight tissue-specific and broadly expressed proteins, as well as over- and under-expressed proteins per tissue. The TissueNet v.3 upgrade has a much larger dataset of proteins and PPIs, and represents 125 adult tissues and seven embryonic tissues. Thus, TissueNet provides an extensive, quantitative, and user-friendly interface to study the roles of human proteins in adulthood and embryonic stages. TissueNet v3 is freely available at https://netbio.bgu.ac.il/tissuenet3.


Assuntos
Embrião de Mamíferos , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas , Adulto , Bases de Dados de Proteínas , Embrião de Mamíferos/metabolismo , Humanos , Proteínas/química , Software
20.
J Mol Biol ; 434(11): 167560, 2022 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-35662457

RESUMO

The advent of single-cell sequencing is providing unprecedented opportunities to disentangle tissue complexity and investigate cell identities and functions. However, the analysis of single cell data is a challenging, multi-step process that requires both advanced computational skills and biological sensibility. When dealing with single cell RNA-seq (scRNA-seq) data, the presence of technical artifacts, noise, and biological biases imposes to first identify, and eventually remove, unreliable signals from low-quality cells and unwanted sources of variation that might affect the efficacy of subsequent downstream modules. Pre-processing and quality control (QC) of scRNA-seq data is a laborious process consisting in the manual combination of different computational strategies to quantify QC-metrics and define optimal sets of pre-processing parameters. Here we present popsicleR, a R package to interactively guide skilled and unskilled command line-users in the pre-processing and QC analysis of scRNA-seq data. The package integrates, into several main wrapper functions, methods derived from widely used pipelines for the estimation of quality-control metrics, filtering of low-quality cells, data normalization, removal of technical and biological biases, and for cell clustering and annotation. popsicleR starts from either the output files of the Cell Ranger pipeline from 10X Genomics or from a feature-barcode matrix of raw counts generated from any scRNA-seq technology. Open-source code, installation instructions, and a case study tutorial are freely available at https://github.com/bicciatolab/popsicleR.


Assuntos
RNA-Seq , Análise de Célula Única , Software , Perfilação da Expressão Gênica/métodos , Controle de Qualidade , RNA-Seq/métodos , Análise de Célula Única/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA