Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
Genome Res ; 34(7): 1027-1035, 2024 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-38951026

RESUMO

mRNA-based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number of mRNAs. The actual mRNA sequence can have a large impact on several properties, including expression, stability, immunogenicity, and more. To enable the selection of an optimal sequence, we developed CodonBERT, a large language model (LLM) for mRNAs. Unlike prior models, CodonBERT uses codons as inputs, which enables it to learn better representations. CodonBERT was trained using more than 10 million mRNA sequences from a diverse set of organisms. The resulting model captures important biological concepts. CodonBERT can also be extended to perform prediction tasks for various mRNA properties. CodonBERT outperforms previous mRNA prediction methods, including on a new flu vaccine data set.


Assuntos
RNA Mensageiro , Vacinas de mRNA , Humanos , RNA Mensageiro/genética , Códon , Algoritmos
2.
Bioinformatics ; 40(Supplement_1): i151-i159, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940139

RESUMO

MOTIVATION: Analysis of time series transcriptomics data from clinical trials is challenging. Such studies usually profile very few time points from several individuals with varying response patterns and dynamics. Current methods for these datasets are mainly based on linear, global orderings using visit times which do not account for the varying response rates and subgroups within a patient cohort. RESULTS: We developed a new method that utilizes multi-commodity flow algorithms for trajectory inference in large scale clinical studies. Recovered trajectories satisfy individual-based timing restrictions while integrating data from multiple patients. Testing the method on multiple drug datasets demonstrated an improved performance compared to prior approaches suggested for this task, while identifying novel disease subtypes that correspond to heterogeneous patient response patterns. AVAILABILITY AND IMPLEMENTATION: The source code and instructions to download the data have been deposited on GitHub at https://github.com/euxhenh/Truffle.


Assuntos
Algoritmos , Transcriptoma , Humanos , Transcriptoma/genética , Perfilação da Expressão Gênica/métodos , Software
3.
Bioinformatics ; 2024 Aug 17.
Artigo em Inglês | MEDLINE | ID: mdl-39152991

RESUMO

MOTIVATION: Spatial transcriptomics allow to quantify mRNA expression within the spatial context. Nonetheless, in-depth analysis of spatial transcriptomics data remains challenging and difficult to scale due to the number of methods and libraries required for that purpose. RESULTS: Here we present SpatialOne, an end-to-end pipeline designed to simplify the analysis of 10x Visium data by combining multiple state-of-the-art computational methods to segment, deconvolve and quantify spatial information; this approach streamlines the analysis of reproducible spatial-data at scale. AVAILABILITY AND IMPLEMENTATION: SpatialOne source code and execution examples are available at https://github.com/Sanofi-Public/spatialone-pipeline. SpatialOne is distributed as a docker container image. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Bioinformatics ; 40(7)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-38810107

RESUMO

MOTIVATION: Lipid nanoparticles (LNPs) are the most widely used vehicles for mRNA vaccine delivery. The structure of the lipids composing the LNPs can have a major impact on the effectiveness of the mRNA payload. Several properties should be optimized to improve delivery and expression including biodegradability, synthetic accessibility, and transfection efficiency. RESULTS: To optimize LNPs, we developed and tested models that enable the virtual screening of LNPs with high transfection efficiency. Our best method uses the lipid Simplified Molecular-Input Line-Entry System (SMILES) as inputs to a large language model. Large language model-generated embeddings are then used by a downstream gradient-boosting classifier. As we show, our method can more accurately predict lipid properties, which could lead to higher efficiency and reduced experimental time and costs. AVAILABILITY AND IMPLEMENTATION: Code and data links available at: https://github.com/Sanofi-Public/LipoBART.


Assuntos
Lipídeos , Nanopartículas , Transfecção , Nanopartículas/química , Lipídeos/química , Transfecção/métodos , RNA Mensageiro/metabolismo , Lipossomos
5.
Artigo em Inglês | MEDLINE | ID: mdl-39137087

RESUMO

Time series RNASeq studies can enable understanding of the dynamics of disease progression and treatment response in patients. They also provide information on biomarkers, activated and repressed pathways, and more. While useful, data from multiple patients is challenging to integrate due to the heterogeneity in treatment response among patients, and the small number of timepoints that are usually profiled. Due to the heterogeneity among patients, relying on the sampled time points to integrate data across individuals is challenging and does not lead to correct reconstruction of the response patterns. To address these challenges, we developed a new constrained based pseudotime ordering method for analyzing transcriptomics data in clinical and response studies. Our method allows the assignment of samples to their correct placement on the response curve while respecting the individual patient order. We use polynomials to represent gene expression over the duration of the study and an EM algorithm to determine parameters and locations. Application to three treatment response datasets shows that our method improves on prior methods and leads to accurate orderings that provide new biological insight on the disease and response. Code for the method is available at https://github.com/Sanofi-Public/ RDCS-bulkRNASeq-pseudo ordering.

6.
bioRxiv ; 2024 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-38260359

RESUMO

Direct nanopore-based RNA sequencing can be used to detect post-transcriptional base modifications, such as m6A methylation, based on the electric current signals produced by the distinct chemical structures of modified bases. A key challenge is the scarcity of adequate training data with known methylation modifications. We present Xron, a hybrid encoder-decoder framework that delivers a direct methylation-distinguishing basecaller by training on synthetic RNA data and immunoprecipitation-based experimental data in two steps. First, we generate data with more diverse modification combinations through in silico cross-linking. Second, we use this dataset to train an end-to-end neural network basecaller followed by fine-tuning on immunoprecipitation-based experimental data with label-smoothing. The trained neural network basecaller outperforms existing methylation detection methods on both read-level and site-level prediction scores. Xron is a standalone, end-to-end m6A-distinguishing basecaller capable of detecting methylated bases directly from raw sequencing signals, enabling de novo methylome assembly.

7.
bioRxiv ; 2023 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-38187629

RESUMO

Many popular spatial transcriptomics techniques lack single-cell resolution. Instead, these methods measure the collective gene expression for each location from a mixture of cells, potentially containing multiple cell types. Here, we developed scResolve, a method for recovering single-cell expression profiles from spatial transcriptomics measurements at multi-cellular resolution. scResolve accurately restores expression profiles of individual cells at their locations, which is unattainable from cell type deconvolution. Applications of scResolve on human breast cancer data and human lung disease data demonstrate that scResolve enables cell type-specific differential gene expression analysis between different tissue contexts and accurate identification of rare cell populations. The spatially resolved cellular-level expression profiles obtained through scResolve facilitate more flexible and precise spatial analysis that complements raw multi-cellular level analysis.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA