Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
bioRxiv ; 2023 Dec 04.
Article in English | MEDLINE | ID: mdl-38106141

ABSTRACT

There is currently no method to distinguish between germline and somatic structural variants (SVs) in tumor samples that lack a matched normal sample. In this study, we analyzed several features of germline and somatic SVs from a cohort of 974 patients from The Cancer Genome Atlas (TCGA). We identified a total of 21 features that differed significantly between germline and somatic SVs. Several of the germline SV features were associated with each other, as were several of the somatic SV features. We also found that these associations differed between the germline and somatic classes, for example, we found that somatic inversions were more likely to be longer events than their germline counterparts. Using these features we trained a support vector machine (SVM) classifier on 555,849 TCGA SVs to computationally distinguish germline from somatic SVs in the absence of a matched normal. This classifier had an ROC curve AUC of 0.984 when tested on an independent test set of 277,925 TCGA SVs. In this dataset, we achieved a positive predictive value (PPV) of 0.81 for an SV called somatic by the classifier being truly somatic. We further tested the classifier on a separate set of 7,623 SVs from pediatric high-grade gliomas (pHGG). In this non-TCGA cohort, our classifier achieved a PPV of 0.828, showing robust performance across datasets.

2.
Nucleic Acids Res ; 51(8): e46, 2023 05 08.
Article in English | MEDLINE | ID: mdl-36912074

ABSTRACT

16S rRNA gene sequence clustering is an important tool in characterizing the diversity of microbial communities. As 16S rRNA gene data sets are growing in size, existing sequence clustering algorithms increasingly become an analytical bottleneck. Part of this bottleneck is due to the substantial computational cost expended on small clusters and singleton sequences. We propose an iterative sampling-based 16S rRNA gene sequence clustering approach that targets the largest clusters in the data set, allowing users to stop the clustering process when sufficient clusters are available for the specific analysis being targeted. We describe a probabilistic analysis of the iterative clustering process that supports the intuition that the clustering process identifies the larger clusters in the data set first. Using real data sets of 16S rRNA gene sequences, we show that the iterative algorithm, coupled with an adaptive sampling process and a mode-shifting strategy for identifying cluster representatives, substantially speeds up the clustering process while being effective at capturing the large clusters in the data set. The experiments also show that SCRAPT (Sample, Cluster, Recruit, AdaPt and iTerate) is able to produce operational taxonomic units that are less fragmented than popular tools: UCLUST, CD-HIT and DNACLUST. The algorithm is implemented in the open-source package SCRAPT. The source code used to generate the results presented in this paper is available at https://github.com/hsmurali/SCRAPT.


Subject(s)
Algorithms , Software , RNA, Ribosomal, 16S/genetics , Genes, rRNA , Cluster Analysis
3.
PLoS Comput Biol ; 17(9): e1009380, 2021 09.
Article in English | MEDLINE | ID: mdl-34491988

ABSTRACT

The SARS-CoV-2 pandemic highlights the need for a detailed molecular understanding of protective antibody responses. This is underscored by the emergence and spread of SARS-CoV-2 variants, including Alpha (B.1.1.7) and Delta (B.1.617.2), some of which appear to be less effectively targeted by current monoclonal antibodies and vaccines. Here we report a high resolution and comprehensive map of antibody recognition of the SARS-CoV-2 spike receptor binding domain (RBD), which is the target of most neutralizing antibodies, using computational structural analysis. With a dataset of nonredundant experimentally determined antibody-RBD structures, we classified antibodies by RBD residue binding determinants using unsupervised clustering. We also identified the energetic and conservation features of epitope residues and assessed the capacity of viral variant mutations to disrupt antibody recognition, revealing sets of antibodies predicted to effectively target recently described viral variants. This detailed structure-based reference of antibody RBD recognition signatures can inform therapeutic and vaccine design strategies.


Subject(s)
Antibodies, Viral , COVID-19/virology , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus , Antibodies, Viral/chemistry , Antibodies, Viral/metabolism , Binding Sites , Cluster Analysis , Computational Biology , Humans , Models, Molecular , Protein Binding , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/genetics , Spike Glycoprotein, Coronavirus/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...