Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Bioinformatics ; 34(2): 289-291, 2018 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-28968739

RESUMEN

SUMMARY: Addressing deleterious effects of noncoding mutations is an essential step towards the identification of disease-causal mutations of gene regulatory elements. Several methods for quantifying the deleteriousness of noncoding mutations using artificial intelligence, deep learning and other approaches have been recently proposed. Although the majority of the proposed methods have demonstrated excellent accuracy on different test sets, there is rarely a consensus. In addition, advanced statistical and artificial learning approaches used by these methods make it difficult porting these methods outside of the labs that have developed them. To address these challenges and to transform the methodological advances in predicting deleterious noncoding mutations into a practical resource available for the broader functional genomics and population genetics communities, we developed SNPDelScore, which uses a panel of proposed methods for quantifying deleterious effects of noncoding mutations to precompute and compare the deleteriousness scores of all common SNPs in the human genome in 44 cell lines. The panel of deleteriousness scores of a SNP computed using different methods is supplemented by functional information from the GWAS Catalog, libraries of transcription factor-binding sites, and genic characteristics of mutations. SNPDelScore comes with a genome browser capable of displaying and comparing large sets of SNPs in a genomic locus and rapidly identifying consensus SNPs with the highest deleteriousness scores making those prime candidates for phenotype-causal polymorphisms. AVAILABILITY AND IMPLEMENTATION: https://www.ncbi.nlm.nih.gov/research/snpdelscore/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

2.
Nucleic Acids Res ; 45(5): 2307-2317, 2017 03 17.
Artículo en Inglés | MEDLINE | ID: mdl-27980060

RESUMEN

The majority of genome-wide association study (GWAS) risk variants reside in non-coding DNA sequences. Understanding how these sequence modifications lead to transcriptional alterations and cell-to-cell variability can help unraveling genotype-phenotype relationships. Here, we describe a computational method, dubbed CAPE, which calculates the likelihood of a genetic variant deactivating enhancers by disrupting the binding of transcription factors (TFs) in a given cellular context. CAPE learns sequence signatures associated with putative enhancers originating from large-scale sequencing experiments (such as ChIP-seq or DNase-seq) and models the change in enhancer signature upon a single nucleotide substitution. CAPE accurately identifies causative cis-regulatory variation including expression quantitative trait loci (eQTLs) and DNase I sensitivity quantitative trait loci (dsQTLs) in a tissue-specific manner with precision superior to several currently available methods. The presented method can be trained on any tissue-specific dataset of enhancers and known functional variants and applied to prioritize disease-associated variants in the corresponding tissue.


Asunto(s)
Elementos de Facilitación Genéticos , Estudios de Asociación Genética , Genoma Humano , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Factores de Transcripción/metabolismo , Linfocitos B/citología , Linfocitos B/metabolismo , Secuencia de Bases , Desoxirribonucleasa I/metabolismo , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Funciones de Verosimilitud , Aprendizaje Automático , Especificidad de Órganos , Unión Proteica , Factores de Transcripción/genética , Transcripción Genética
3.
Genome Biol ; 25(1): 12, 2024 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-38191464

RESUMEN

The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we present GTax, a taxonomy-structured database of genomic sequences that can be used with BLAST to detect and remove foreign contamination in RNA sequencing samples before assembly. In addition, we use a de novo transcriptome assembly of Solanum lycopersicum (tomato) to demonstrate that removing foreign contamination in sequencing samples reduces the number of assembled chimeric transcripts.


Asunto(s)
Solanum lycopersicum , Transcriptoma , Bases de Datos Factuales , Genómica , ARN , Análisis de Secuencia de ARN , Solanum lycopersicum/genética
4.
bioRxiv ; 2023 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-36789435

RESUMEN

Background: Biomedical researchers use alignments produced by BLAST (Basic Local Alignment Search Tool) to categorize their query sequences. Producing such alignments is an essential bioinformatics task that is well suited for the cloud. The cloud can perform many calculations quickly as well as store and access large volumes of data. Bioinformaticians can also use it to collaborate with other researchers, sharing their results, datasets and even their pipelines on a common platform. Results: We present ElasticBLAST, a cloud native application to perform BLAST alignments in the cloud. ElasticBLAST can handle anywhere from a few to many thousands of queries and run the searches on thousands of virtual CPUs (if desired), deleting resources when it is done. It uses cloud native tools for orchestration and can request discounted instances, lowering cloud costs for users. It is supported on Amazon Web Services and Google Cloud Platform. It can search BLAST databases that are user provided or from the National Center for Biotechnology Information. Conclusion: We show that ElasticBLAST is a useful application that can efficiently perform BLAST searches for the user in the cloud, demonstrating that with two examples. At the same time, it hides much of the complexity of working in the cloud, lowering the threshold to move work to the cloud.

5.
Gigascience ; 10(2)2021 01 29.
Artículo en Inglés | MEDLINE | ID: mdl-33511996

RESUMEN

BACKGROUND: The NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability (STRIDES) initiative provides NIH-funded researchers cost-effective access to commercial cloud providers, such as Amazon Web Services (AWS) and Google Cloud Platform (GCP). These cloud providers represent an alternative for the execution of large computational biology experiments like transcriptome annotation, which is a complex analytical process that requires the interrogation of multiple biological databases with several advanced computational tools. The core components of annotation pipelines published since 2012 are BLAST sequence alignments using annotated databases of both nucleotide or protein sequences almost exclusively with networked on-premises compute systems. FINDINGS: We compare multiple BLAST sequence alignments using AWS and GCP. We prepared several Jupyter Notebooks with all the code required to submit computing jobs to the batch system on each cloud provider. We consider the consequence of the number of query transcripts in input files and the effect on cost and processing time. We tested compute instances with 16, 32, and 64 vCPUs on each cloud provider. Four classes of timing results were collected: the total run time, the time for transferring the BLAST databases to the instance local solid-state disk drive, the time to execute the CWL script, and the time for the creation, set-up, and release of an instance. This study aims to establish an estimate of the cost and compute time needed for the execution of multiple BLAST runs in a cloud environment. CONCLUSIONS: We demonstrate that public cloud providers are a practical alternative for the execution of advanced computational biology experiments at low cost. Using our cloud recipes, the BLAST alignments required to annotate a transcriptome with ∼500,000 transcripts can be processed in <2 hours with a compute cost of ∼$200-$250. In our opinion, for BLAST-based workflows, the choice of cloud platform is not dependent on the workflow but, rather, on the specific details and requirements of the cloud provider. These choices include the accessibility for institutional use, the technical knowledge required for effective use of the platform services, and the availability of open source frameworks such as APIs to deploy the workflow.


Asunto(s)
Programas Informáticos , Transcriptoma , Nube Computacional , Biología Computacional , Bases de Datos Factuales , Flujo de Trabajo
6.
iScience ; 23(3): 100939, 2020 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-32169820

RESUMEN

Missense mutations may affect proteostasis by destabilizing or over-stabilizing protein complexes and changing the pathway flux. Predicting the effects of stabilizing mutations on protein-protein interactions is notoriously difficult because existing experimental sets are skewed toward mutations reducing protein-protein binding affinity and many computational methods fail to correctly evaluate their effects. To address this issue, we developed a method MutaBind2, which estimates the impacts of single as well as multiple mutations on protein-protein interactions. MutaBind2 employs only seven features, and the most important of them describe interactions of proteins with the solvent, evolutionary conservation of the site, and thermodynamic stability of the complex and each monomer. This approach shows a distinct improvement especially in evaluating the effects of mutations increasing binding affinity. MutaBind2 can be used for finding disease driver mutations, designing stable protein complexes, and discovering new protein-protein interaction inhibitors.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA