Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters











Database
Language
Publication year range
1.
ACS Comb Sci ; 22(8): 410-421, 2020 08 10.
Article in English | MEDLINE | ID: mdl-32531158

ABSTRACT

DNA-encoded libraries (DELs) are large, pooled collections of compounds in which every library member is attached to a stretch of DNA encoding its complete synthetic history. DEL-based hit discovery involves affinity selection of the library against a protein of interest, whereby compounds retained by the target are subsequently identified by next-generation sequencing of the corresponding DNA tags. When analyzing the resulting data, one typically assumes that sequencing output (i.e., read counts) is proportional to the binding affinity of a given compound, thus enabling hit prioritization and elucidation of any underlying structure-activity relationships (SAR). This assumption, though, tends to be severely confounded by a number of factors, including variable reaction yields, presence of incomplete products masquerading as their intended counterparts, and sequencing noise. In practice, these confounders are often ignored, potentially contributing to low hit validation rates, and universally leading to loss of valuable information. To address this issue, we have developed a method for comprehensively denoising DEL selection outputs. Our method, dubbed "deldenoiser", is based on sparse learning and leverages inputs that are commonly available within a DEL generation and screening workflow. Using simulated and publicly available DEL affinity selection data, we show that "deldenoiser" is not only able to recover and rank true binders much more robustly than read count-based approaches but also that it yields scores, which accurately capture the underlying SAR. The proposed method can, thus, be of significant utility in hit prioritization following DEL screens.


Subject(s)
DNA/chemistry , Gene Library , Machine Learning
2.
Nat Genet ; 51(2): 354-362, 2019 02.
Article in English | MEDLINE | ID: mdl-30643257

ABSTRACT

The human reference genome serves as the foundation for genomics by providing a scaffold for alignment of sequencing reads, but currently only reflects a single consensus haplotype, thus impairing analysis accuracy. Here we present a graph reference genome implementation that enables read alignment across 2,800 diploid genomes encompassing 12.6 million SNPs and 4.0 million insertions and deletions (indels). The pipeline processes one whole-genome sequencing sample in 6.5 h using a system with 36 CPU cores. We show that using a graph genome reference improves read mapping sensitivity and produces a 0.5% increase in variant calling recall, with unaffected specificity. Structural variations incorporated into a graph genome can be genotyped accurately under a unified framework. Finally, we show that iterative augmentation of graph genomes yields incremental gains in variant calling accuracy. Our implementation is an important advance toward fulfilling the promise of graph genomes to radically enhance the scalability and accuracy of genomic analyses.


Subject(s)
Genome, Human/genetics , Genomics/methods , Humans , Polymorphism, Single Nucleotide/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Sequence Deletion/genetics , Whole Genome Sequencing/methods
3.
Bioinformatics ; 34(20): 3488-3495, 2018 10 15.
Article in English | MEDLINE | ID: mdl-29850774

ABSTRACT

Motivation: Classical methods of comparing the accuracies of variant calling pipelines are based on truth sets of variants whose genotypes are previously determined with high confidence. An alternative way of performing benchmarking is based on Mendelian constraints between related individuals. Statistical analysis of Mendelian violations can provide truth set-independent benchmarking information, and enable benchmarking less-studied variants and diverse populations. Results: We introduce a statistical mixture model for comparing two variant calling pipelines from genotype data they produce after running on individual members of a trio. We determine the accuracy of our model by comparing the precision and recall of GATK Unified Genotyper and Haplotype Caller on the high-confidence SNPs of the NIST Ashkenazim trio and the two independent Platinum Genome trios. We show that our method is able to estimate differential precision and recall between the two pipelines with 10-3 uncertainty. Availability and implementation: The Python library geck, and usage examples are available at the following URL: https://github.com/sbg/geck, under the GNU General Public License v3. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Benchmarking , Genome , Genotype , Polymorphism, Single Nucleotide
4.
Bioinformatics ; 34(24): 4241-4247, 2018 12 15.
Article in English | MEDLINE | ID: mdl-29868720

ABSTRACT

Motivation: Several tools exist to count Mendelian violations in family trios by comparing variants at the same genomic positions. This naive variant comparison, however, fails to assess regions where multiple variants need to be examined together, resulting in reduced accuracy of existing Mendelian violation checking tools. Results: We introduce VBT, a trio concordance analysis tool, which identifies Mendelian violations by approximately solving the 3-way variant matching problem to resolve variant representation differences in family trios. We show that VBT outperforms previous trio comparison methods by accuracy. Availability and implementation: VBT is implemented in C++ and source code is available under GNU GPLv3 license at the following URL: https://github.com/sbg/VBT-TrioAnalysis.git. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Genomics , Software , Genome/genetics , Genomics/methods
SELECTION OF CITATIONS
SEARCH DETAIL