Pesquisa | Portal Regional da BVS

scHLAcount: allele-specific HLA expression from single-cell gene expression data.

Darby, Charlotte A; Stubbington, Michael J T; Marks, Patrick J; Martínez Barrio, Álvaro; Fiddes, Ian T.

Bioinformatics ; 36(12): 3905-3906, 2020 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-32330223

RESUMO

SUMMARY: Bulk RNA sequencing studies have demonstrated that human leukocyte antigen (HLA) genes may be expressed in a cell type-specific and allele-specific fashion. Single-cell gene expression assays have the potential to further resolve these expression patterns, but currently available methods do not perform allele-specific quantification at the molecule level. Here, we present scHLAcount, a post-processing workflow for single-cell RNA-seq data that computes allele-specific molecule counts of the HLA genes based on a personalized reference constructed from the sample's HLA genotypes. AVAILABILITY AND IMPLEMENTATION: scHLAcount is available under the MIT license at https://github.com/10XGenomics/scHLAcount. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Análise de Célula Única , Software , Alelos , Expressão Gênica , Humanos , Análise de Sequência de RNA , Fluxo de Trabalho

Vargas: heuristic-free alignment for assessing linear and graph read aligners.

Darby, Charlotte A; Gaddipati, Ravi; Schatz, Michael C; Langmead, Ben.

Bioinformatics ; 36(12): 3712-3718, 2020 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-32321164

RESUMO

MOTIVATION: Read alignment is central to many aspects of modern genomics. Most aligners use heuristics to accelerate processing, but these heuristics can fail to find the optimal alignments of reads. Alignment accuracy is typically measured through simulated reads; however, the simulated location may not be the (only) location with the optimal alignment score. RESULTS: Vargas implements a heuristic-free algorithm guaranteed to find the highest-scoring alignment for real sequencing reads to a linear or graph genome. With semiglobal and local alignment modes and affine gap and quality-scaled mismatch penalties, it can implement the scoring functions of commonly used aligners to calculate optimal alignments. While this is computationally intensive, Vargas uses multi-core parallelization and vectorized (SIMD) instructions to make it practical to optimally align large numbers of reads, achieving a maximum speed of 456 billion cell updates per second. We demonstrate how these 'gold standard' Vargas alignments can be used to improve heuristic alignment accuracy by optimizing command-line parameters in Bowtie 2, BWA-maximal exact match and vg to align more reads correctly. AVAILABILITY AND IMPLEMENTATION: Source code implemented in C++ and compiled binary releases are available at https://github.com/langmead-lab/vargas under the MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Heurística , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Genômica , Análise de Sequência de DNA , Software

Samovar: Single-Sample Mosaic Single-Nucleotide Variant Calling with Linked Reads.

Darby, Charlotte A; Fitch, James R; Brennan, Patrick J; Kelly, Benjamin J; Bir, Natalie; Magrini, Vincent; Leonard, Jeffrey; Cottrell, Catherine E; Gastier-Foster, Julie M; Wilson, Richard K; Mardis, Elaine R; White, Peter; Langmead, Ben; Schatz, Michael C.

iScience ; 18: 1-10, 2019 Aug 30.

Artigo em Inglês | MEDLINE | ID: mdl-31271967

RESUMO

Linked-read sequencing enables greatly improves haplotype assembly over standard paired-end analysis. The detection of mosaic single-nucleotide variants benefits from haplotype assembly when the model is informed by the mapping between constituent reads and linked reads. Samovar evaluates haplotype-discordant reads identified through linked-read sequencing, thus enabling phasing and mosaic variant detection across the entire genome. Samovar trains a random forest model to score candidate sites using a dataset that considers read quality, phasing, and linked-read characteristics. Samovar calls mosaic single-nucleotide variants (SNVs) within a single sample with accuracy comparable with what previously required trios or matched tumor/normal pairs and outperforms single-sample mosaic variant callers at minor allele frequency 5%-50% with at least 30X coverage. Samovar finds somatic variants in both tumor and normal whole-genome sequencing from 13 pediatric cancer cases that can be corroborated with high recall with whole exome sequencing. Samovar is available open-source at https://github.com/cdarby/samovar under the MIT license.

Piercing the dark matter: bioinformatics of long-range sequencing and mapping.

Sedlazeck, Fritz J; Lee, Hayan; Darby, Charlotte A; Schatz, Michael C.

Nat Rev Genet ; 19(6): 329-346, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-29599501

RESUMO

Several new genomics technologies have become available that offer long-read sequencing or long-range mapping with higher throughput and higher resolution analysis than ever before. These long-range technologies are rapidly advancing the field with improved reference genomes, more comprehensive variant identification and more complete views of transcriptomes and epigenomes. However, they also require new bioinformatics approaches to take full advantage of their unique characteristics while overcoming their complex errors and modalities. Here, we discuss several of the most important applications of the new technologies, focusing on both the currently available bioinformatics tools and opportunities for future research.

Assuntos

Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Transcriptoma , Animais , Humanos

LRSim: A Linked-Reads Simulator Generating Insights for Better Genome Partitioning.

Luo, Ruibang; Sedlazeck, Fritz J; Darby, Charlotte A; Kelly, Stephen M; Schatz, Michael C.

Comput Struct Biotechnol J ; 15: 478-484, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29213995

RESUMO

Linked-read sequencing, using highly-multiplexed genome partitioning and barcoding, can span hundreds of kilobases to improve de novo assembly, haplotype phasing, and other applications. Based on our analysis of 14 datasets, we introduce LRSim that simulates linked-reads by emulating the library preparation and sequencing process with fine control over variants, linked-read characteristics, and the short-read profile. We conclude from the phasing and assembly of multiple datasets, recommendations on coverage, fragment length, and partitioning when sequencing genomes of different sizes and complexities. These optimizations improve results by orders of magnitude, and enable the development of novel methods. LRSim is available at https://github.com/aquaskyline/LRSIM.

Xenolog classification.

Darby, Charlotte A; Stolzer, Maureen; Ropp, Patrick J; Barker, Daniel; Durand, Dannie.

Bioinformatics ; 33(5): 640-649, 2017 03 01.

Artigo em Inglês | MEDLINE | ID: mdl-27998934

RESUMO

Motivation: Orthology analysis is a fundamental tool in comparative genomics. Sophisticated methods have been developed to distinguish between orthologs and paralogs and to classify paralogs into subtypes depending on the duplication mechanism and timing, relative to speciation. However, no comparable framework exists for xenologs: gene pairs whose history, since their divergence, includes a horizontal transfer. Further, the diversity of gene pairs that meet this broad definition calls for classification of xenologs with similar properties into subtypes. Results: We present a xenolog classification that uses phylogenetic reconciliation to assign each pair of genes to a class based on the event responsible for their divergence and the historical association between genes and species. Our classes distinguish between genes related through transfer alone and genes related through duplication and transfer. Further, they separate closely-related genes in distantly-related species from distantly-related genes in closely-related species. We present formal rules that assign gene pairs to specific xenolog classes, given a reconciled gene tree with an arbitrary number of duplications and transfers. These xenology classification rules have been implemented in software and tested on a collection of â¼13 000 prokaryotic gene families. In addition, we present a case study demonstrating the connection between xenolog classification and gene function prediction. Availability and Implementation: The xenolog classification rules have been implemented in N otung 2.9, a freely available phylogenetic reconciliation software package. http://www.cs.cmu.edu/~durand/Notung . Gene trees are available at http://dx.doi.org/10.7488/ds/1503 . Contact: durand@cmu.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Genes Bacterianos , Genômica/métodos , Filogenia , Software , Algoritmos , Bactérias/genética , Evolução Molecular , Homologia de Sequência do Ácido Nucleico

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA