Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
BMC Bioinformatics ; 23(1): 446, 2022 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-36289480

RESUMEN

BACKGROUND: In the CRISPR-Cas9 system, the efficiency of genetic modifications has been found to vary depending on the single guide RNA (sgRNA) used. A variety of sgRNA properties have been found to be predictive of CRISPR cleavage efficiency, including the position-specific sequence composition of sgRNAs, global sgRNA sequence properties, and thermodynamic features. While prevalent existing deep learning-based approaches provide competitive prediction accuracy, a more interpretable model is desirable to help understand how different features may contribute to CRISPR-Cas9 cleavage efficiency. RESULTS: We propose a gradient boosting approach, utilizing LightGBM to develop an integrated tool, BoostMEC (Boosting Model for Efficient CRISPR), for the prediction of wild-type CRISPR-Cas9 editing efficiency. We benchmark BoostMEC against 10 popular models on 13 external datasets and show its competitive performance. CONCLUSIONS: BoostMEC can provide state-of-the-art predictions of CRISPR-Cas9 cleavage efficiency for sgRNA design and selection. Relying on direct and derived sequence features of sgRNA sequences and based on conventional machine learning, BoostMEC maintains an advantage over other state-of-the-art CRISPR efficiency prediction models that are based on deep learning through its ability to produce more interpretable feature insights and predictions.


Asunto(s)
Sistemas CRISPR-Cas , ARN Pequeño no Traducido , Edición Génica , Aprendizaje Automático , ARN Pequeño no Traducido/genética
2.
Bioinformatics ; 35(8): 1395-1403, 2019 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-30239588

RESUMEN

MOTIVATION: Hypertension is a heterogeneous syndrome in need of improved subtyping using phenotypic and genetic measurements with the goal of identifying subtypes of patients who share similar pathophysiologic mechanisms and may respond more uniformly to targeted treatments. Existing machine learning approaches often face challenges in integrating phenotype and genotype information and presenting to clinicians an interpretable model. We aim to provide informed patient stratification based on phenotype and genotype features. RESULTS: In this article, we present a hybrid non-negative matrix factorization (HNMF) method to integrate phenotype and genotype information for patient stratification. HNMF simultaneously approximates the phenotypic and genetic feature matrices using different appropriate loss functions, and generates patient subtypes, phenotypic groups and genetic groups. Unlike previous methods, HNMF approximates phenotypic matrix under Frobenius loss, and genetic matrix under Kullback-Leibler (KL) loss. We propose an alternating projected gradient method to solve the approximation problem. Simulation shows HNMF converges fast and accurately to the true factor matrices. On a real-world clinical dataset, we used the patient factor matrix as features and examined the association of these features with indices of cardiac mechanics. We compared HNMF with six different models using phenotype or genotype features alone, with or without NMF, or using joint NMF with only one type of loss We also compared HNMF with 3 recently published methods for integrative clustering analysis, including iClusterBayes, Bayesian joint analysis and JIVE. HNMF significantly outperforms all comparison models. HNMF also reveals intuitive phenotype-genotype interactions that characterize cardiac abnormalities. AVAILABILITY AND IMPLEMENTATION: Our code is publicly available on github at https://github.com/yuanluo/hnmf. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Hipertensión , Teorema de Bayes , Genotipo , Humanos , Fenotipo
4.
Genome Biol ; 20(1): 75, 2019 04 16.
Artículo en Inglés | MEDLINE | ID: mdl-30992037

RESUMEN

RNA degradation affects RNA-seq quality when profiling transcriptional activities in cells. Here, we show that transcript degradation is both gene- and sample-specific and is a common and significant factor that may bias the results in RNA-seq analysis. Most existing global normalization approaches are ineffective to correct for degradation bias. We propose a novel pipeline named DegNorm to adjust the read counts for transcript degradation heterogeneity on a gene-by-gene basis while simultaneously controlling for the sequencing depth. The robust and effective performance of this method is demonstrated in an extensive set of simulated and real RNA-seq data.


Asunto(s)
Algoritmos , Estabilidad del ARN , Análisis de Secuencia de ARN , Línea Celular , Humanos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA