Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 5.712
Filter
Add more filters

Publication year range
1.
Cell ; 184(13): 3376-3393.e17, 2021 06 24.
Article in English | MEDLINE | ID: mdl-34043940

ABSTRACT

We present a global atlas of 4,728 metagenomic samples from mass-transit systems in 60 cities over 3 years, representing the first systematic, worldwide catalog of the urban microbial ecosystem. This atlas provides an annotated, geospatial profile of microbial strains, functional characteristics, antimicrobial resistance (AMR) markers, and genetic elements, including 10,928 viruses, 1,302 bacteria, 2 archaea, and 838,532 CRISPR arrays not found in reference databases. We identified 4,246 known species of urban microorganisms and a consistent set of 31 species found in 97% of samples that were distinct from human commensal organisms. Profiles of AMR genes varied widely in type and density across cities. Cities showed distinct microbial taxonomic signatures that were driven by climate and geographic differences. These results constitute a high-resolution global metagenomic atlas that enables discovery of organisms and genes, highlights potential public health and forensic applications, and provides a culture-independent view of AMR burden in cities.


Subject(s)
Drug Resistance, Bacterial/genetics , Metagenomics , Microbiota/genetics , Urban Population , Biodiversity , Databases, Genetic , Humans
2.
Proc Natl Acad Sci U S A ; 121(27): e2318198121, 2024 Jul 02.
Article in English | MEDLINE | ID: mdl-38917007

ABSTRACT

Establishing modular binders as diagnostic detection agents represents a cost- and time-efficient alternative to the commonly used binders that are generated one molecule at a time. In contrast to these conventional approaches, a modular binder can be designed in silico from individual modules to, in principle, recognize any desired linear epitope without going through a selection and hit-validation process, given a set of preexisting, amino acid-specific modules. Designed armadillo repeat proteins (dArmRP) have been developed as modular binder scaffolds, and we report here the generation of highly specific dArmRP modules by yeast surface display selection, performed on a rationally designed dArmRP library. A selection strategy was developed to distinguish the binding difference resulting from a single amino acid mutation in the target peptide. Our reverse-competitor strategy introduced here employs the designated target as a competitor to increase the sensitivity when separating specific from cross-reactive binders that show similar affinities for the target peptide. With this switch in selection focus from affinity to specificity, we found that the enrichment during this specificity sort is indicative of the desired phenotype, regardless of the binder abundance. Hence, deep sequencing of the selection pools allows retrieval of phenotypic hits with only 0.1% abundance in the selectivity sort pool from the next-generation sequencing data alone. In a proof-of-principle study, a binder was created by replacing all corresponding wild-type modules with a newly selected module, yielding a binder with very high affinity for the designated target that has been successfully validated as a detection agent in western blot analysis.


Subject(s)
Armadillo Domain Proteins , Saccharomyces cerevisiae , Armadillo Domain Proteins/genetics , Armadillo Domain Proteins/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , High-Throughput Nucleotide Sequencing/methods , Protein Binding , Peptides/metabolism , Peptides/genetics , Peptides/chemistry , Epitopes/genetics , Peptide Library
3.
Hum Mol Genet ; 33(14): 1207-1214, 2024 Jul 06.
Article in English | MEDLINE | ID: mdl-38643062

ABSTRACT

Genotype imputation is widely used in genome-wide association studies (GWAS). However, both the genotyping chips and imputation reference panels are dependent on next-generation sequencing (NGS). Due to the nature of NGS, some regions of the genome are inaccessible to sequencing. To date, there has been no complete evaluation of these regions and their impact on the identification of associations in GWAS remains unclear. In this study, we systematically assess the extent to which variants in inaccessible regions are underrepresented on genotyping chips and imputation reference panels, in GWAS results and in variant databases. We also determine the proportion of genes located in inaccessible regions and compare the results across variant masks defined by the 1000 Genomes Project and the TOPMed program. Overall, fewer variants were observed in inaccessible regions in all categories analyzed. Depending on the mask used and normalized for region size, only 4%-17% of the genotyped variants are located in inaccessible regions and 52 to 581 genes were almost completely inaccessible. From the Cooperative Health Research in South Tyrol (CHRIS) study, we present a case study of an association located in an inaccessible region that is driven by genotyped variants and cannot be reproduced by imputation in GRCh37. We conclude that genotyping, NGS, genotype imputation and downstream analyses such as GWAS and fine mapping are systematically biased in inaccessible regions, due to missed variants and spurious associations. To help researchers assess gene and variant accessibility, we provide an online application (https://gab.gm.eurac.edu).


Subject(s)
Genome, Human , Genome-Wide Association Study , Genotype , High-Throughput Nucleotide Sequencing , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study/methods , High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide/genetics
4.
Trends Genet ; 39(9): 649-671, 2023 09.
Article in English | MEDLINE | ID: mdl-37230864

ABSTRACT

Long-read sequencing (LRS) technologies have provided extremely powerful tools to explore genomes. While in the early years these methods suffered technical limitations, they have recently made significant progress in terms of read length, throughput, and accuracy and bioinformatics tools have strongly improved. Here, we aim to review the current status of LRS technologies, the development of novel methods, and the impact on genomics research. We will explore the most impactful recent findings made possible by these technologies focusing on high-resolution sequencing of genomes and transcriptomes and the direct detection of DNA and RNA modifications. We will also discuss how LRS methods promise a more comprehensive understanding of human genetic variation, transcriptomics, and epigenetics for the coming years.


Subject(s)
Genomics , High-Throughput Nucleotide Sequencing , Humans , High-Throughput Nucleotide Sequencing/methods , Genomics/methods , Sequence Analysis, DNA/methods , Computational Biology , Gene Expression Profiling/methods
5.
Brief Bioinform ; 25(2)2024 Jan 22.
Article in English | MEDLINE | ID: mdl-38271481

ABSTRACT

Next-generation sequencing (NGS) has revolutionized the field of rare disease diagnostics. Whole exome and whole genome sequencing are now routinely used for diagnostic purposes; however, the overall diagnosis rate remains lower than expected. In this work, we review current approaches used for calling and interpretation of germline genetic variants in the human genome, and discuss the most important challenges that persist in the bioinformatic analysis of NGS data in medical genetics. We describe and attempt to quantitatively assess the remaining problems, such as the quality of the reference genome sequence, reproducible coverage biases, or variant calling accuracy in complex regions of the genome. We also discuss the prospects of switching to the complete human genome assembly or the human pan-genome and important caveats associated with such a switch. We touch on arguably the hardest problem of NGS data analysis for medical genomics, namely, the annotation of genetic variants and their subsequent interpretation. We highlight the most challenging aspects of annotation and prioritization of both coding and non-coding variants. Finally, we demonstrate the persistent prevalence of pathogenic variants in the coding genome, and outline research directions that may enhance the efficiency of NGS-based disease diagnostics.


Subject(s)
Computational Biology , Rare Diseases , Humans , Rare Diseases/diagnosis , Rare Diseases/genetics , Genomics , Genome, Human , Germ Cells , High-Throughput Nucleotide Sequencing
6.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38920083

ABSTRACT

This study proposes a novel approach to studying severe acute respiratory syndrome coronavirus 2 virus mutations through sequencing data comparison. Traditional consensus-based methods, which focus on the most common nucleotide at each position, might overlook or obscure the presence of low-frequency variants. Our method, in contrast, retains all sequenced nucleotides at each position, forming a genomic matrix. Utilizing simulated short reads from genomes with specified mutations, we contrasted our genomic matrix approach with the consensus sequence method. Our matrix methodology, across multiple simulated datasets, accurately reflected the known mutations with an average accuracy improvement of 20% over the consensus method. In real-world tests using data from GISAID and NCBI-SRA, our approach demonstrated an increase in reliability by reducing the error margin by approximately 15%. The genomic matrix approach offers a more accurate representation of the viral genomic diversity, thereby providing superior insights into virus evolution and epidemiology.


Subject(s)
COVID-19 , Genome, Viral , Phylogeny , SARS-CoV-2 , SARS-CoV-2/genetics , Humans , COVID-19/virology , COVID-19/epidemiology , Mutation , Consensus Sequence , Genetic Variation
7.
Proc Natl Acad Sci U S A ; 120(8): e2216479120, 2023 02 21.
Article in English | MEDLINE | ID: mdl-36791109

ABSTRACT

Anaplastic lymphoma kinase (ALK) fusion variants in Non-Small Cell Lung Cancer (NSCLC) consist of numerous dimerizing fusion partners. Retrospective investigations suggest that treatment benefit in response to ALK tyrosine kinase inhibitors (TKIs) differs dependent on the fusion variant present in the patient tumor. Therefore, understanding the oncogenic signaling networks driven by different ALK fusion variants is important. To do this, we developed controlled inducible cell models expressing either Echinoderm Microtubule Associated Protein Like 4 (EML4)-ALK-V1, EML4-ALK-V3, Kinesin Family Member 5B (KIF5B)-ALK, or TRK-fused gene (TFG)-ALK and investigated their transcriptomic and proteomic responses to ALK activity modulation together with patient-derived ALK-positive NSCLC cell lines. This allowed identification of both common and isoform-specific responses downstream of these four ALK fusions. An inflammatory signature that included upregulation of the Serpin B4 serine protease inhibitor was observed in both ALK fusion inducible and patient-derived cells. We show that Signal transducer and activator of transcription 3 (STAT3), Nuclear Factor Kappa B (NF-κB) and Activator protein 1 (AP1) are major transcriptional regulators of SERPINB4 downstream of ALK fusions. Upregulation of SERPINB4 promotes survival and inhibits natural killer cell-mediated cytotoxicity, which has potential for therapeutic impact targeting the immune response together with ALK TKIs in NSCLC.


Subject(s)
Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Serpins , Humans , Anaplastic Lymphoma Kinase/genetics , Carcinoma, Non-Small-Cell Lung/drug therapy , Carcinoma, Non-Small-Cell Lung/genetics , Carcinoma, Non-Small-Cell Lung/pathology , Lung Neoplasms/drug therapy , Lung Neoplasms/genetics , Lung Neoplasms/pathology , Oncogene Proteins, Fusion/genetics , Oncogene Proteins, Fusion/metabolism , Oncogenes , Protein Kinase Inhibitors/pharmacology , Protein-Tyrosine Kinases/genetics , Proteomics , Retrospective Studies , Serpins/genetics
8.
J Biol Chem ; 300(3): 105767, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38367672

ABSTRACT

Approximately 5 to 15% of nonmedullary thyroid cancers (NMTC) present in a familial form (familial nonmedullary thyroid cancers [FNMTC]). The genetic basis of FNMTC remains largely unknown, representing a limitation for diagnostic and clinical management. Recently, germline mutations in DNA repair-related genes have been described in cases with thyroid cancer (TC), suggesting a role in FNMTC etiology. Here, two FNMTC families were studied, each with two members affected with TC. Ninety-four hereditary cancer predisposition genes were analyzed through next-generation sequencing, revealing two germline CHEK2 missense variants (c.962A > C, p.E321A and c.470T > C, p.I157T), which segregated with TC in each FNMTC family. p.E321A, located in the CHK2 protein kinase domain, is a rare variant, previously unreported in the literature. Conversely, p.I157T, located in CHK2 forkhead-associated domain, has been extensively described, having conflicting interpretations of pathogenicity. CHK2 proteins (WT and variants) were characterized using biophysical methods, molecular dynamics simulations, and immunohistochemistry. Overall, biophysical characterization of these CHK2 variants showed that they have compromised structural and conformational stability and impaired kinase activity, compared to the WT protein. CHK2 appears to aggregate into amyloid-like fibrils in vitro, which opens future perspectives toward positioning CHK2 in cancer pathophysiology. CHK2 variants exhibited higher propensity for this conformational change, also displaying higher expression in thyroid tumors. The present findings support the utility of complementary biophysical and in silico approaches toward understanding the impact of genetic variants in protein structure and function, improving the current knowledge on CHEK2 variants' role in FNMTC genetic basis, with prospective clinical translation.


Subject(s)
Checkpoint Kinase 2 , Neoplastic Syndromes, Hereditary , Thyroid Cancer, Papillary , Thyroid Neoplasms , Humans , Checkpoint Kinase 2/chemistry , Checkpoint Kinase 2/genetics , Checkpoint Kinase 2/metabolism , Genetic Predisposition to Disease , Germ-Line Mutation , Neoplastic Syndromes, Hereditary/genetics , Prospective Studies , Thyroid Cancer, Papillary/genetics , Thyroid Neoplasms/genetics , Protein Domains , Male , Female , Middle Aged
9.
Plant J ; 118(2): 345-357, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38149801

ABSTRACT

RNA editing is a crucial post-transcriptional modification process in plant organellar RNA metabolism. rRNA removal-based total RNA-seq is one of the most common methods to study this event. However, the lack of commercial kits to remove rRNAs limits the usage of this method, especially for non-model plant species. DSN-seq is a transcriptome sequencing method utilizing duplex-specific nuclease (DSN) to degrade highly abundant cDNA species especially those from rRNAs while keeping the robustness of transcript levels of the majority of other mRNAs, and has not been applied to study RNA editing in plants before. In this study, we evaluated the capability of DSN-seq to reduce rRNA content and profile organellar RNA editing events in plants, as well we used commercial Ribo-off-seq and standard mRNA-seq as comparisons. Our results demonstrated that DSN-seq efficiently reduced rRNA content and enriched organellar transcriptomes in rice. With high sensitivity to RNA editing events, DSN-seq and Ribo-off-seq provided a more complete and accurate RNA editing profile of rice, which was further validated by Sanger sequencing. Furthermore, DSN-seq also demonstrated efficient organellar transcriptome enrichment and high sensitivity for profiling RNA editing events in Arabidopsis thaliana. Our study highlights the capability of rRNA removal-based total RNA-seq for profiling RNA editing events in plant organellar transcriptomes and also suggests DSN-seq as a widely accessible RNA editing profiling method for various plant species.


Subject(s)
RNA Editing , Transcriptome , Transcriptome/genetics , RNA Editing/genetics , Organelles/genetics , Organelles/metabolism , RNA, Ribosomal/genetics , RNA, Ribosomal/metabolism , RNA, Plant/genetics , RNA, Plant/metabolism , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, RNA/methods
10.
Brief Bioinform ; 25(1)2023 11 22.
Article in English | MEDLINE | ID: mdl-38048079

ABSTRACT

Identification of viruses and further assembly of viral genomes from the next-generation-sequencing data are essential steps in virome studies. This study presented a one-stop tool named VIGA (available at https://github.com/viralInformatics/VIGA) for eukaryotic virus identification and genome assembly from NGS data. It was composed of four modules, namely, identification, taxonomic annotation, assembly and novel virus discovery, which integrated several third-party tools such as BLAST, Trinity, MetaCompass and RagTag. Evaluation on multiple simulated and real virome datasets showed that VIGA assembled more complete virus genomes than its competitors on both the metatranscriptomic and metagenomic data and performed well in assembling virus genomes at the strain level. Finally, VIGA was used to investigate the virome in metatranscriptomic data from the Human Microbiome Project and revealed different composition and positive rate of viromes in diseases of prediabetes, Crohn's disease and ulcerative colitis. Overall, VIGA would help much in identification and characterization of viromes, especially the known viruses, in future studies.


Subject(s)
Colitis, Ulcerative , Crohn Disease , Humans , High-Throughput Nucleotide Sequencing , Genome, Viral , Metagenome
11.
Mass Spectrom Rev ; 43(1): 5-38, 2024.
Article in English | MEDLINE | ID: mdl-36052666

ABSTRACT

The discovery of RNA silencing has revealed that non-protein-coding sequences (ncRNAs) can cover essential roles in regulatory networks and their malfunction may result in severe consequences on human health. These findings have prompted a general reassessment of the significance of RNA as a key player in cellular processes. This reassessment, however, will not be complete without a greater understanding of the distribution and function of the over 170 variants of the canonical ribonucleotides, which contribute to the breathtaking structural diversity of natural RNA. This review surveys the analytical approaches employed for the identification, characterization, and detection of RNA posttranscriptional modifications (rPTMs). The merits of analyzing individual units after exhaustive hydrolysis of the initial biopolymer are outlined together with those of identifying their position in the sequence of parent strands. Approaches based on next generation sequencing and mass spectrometry technologies are covered in depth to provide a comprehensive view of their respective merits. Deciphering the epitranscriptomic code will require not only mapping the location of rPTMs in the various classes of RNAs, but also assessing the variations of expression levels under different experimental conditions. The fact that no individual platform is currently capable of meeting all such demands implies that it will be essential to capitalize on complementary approaches to obtain the desired information. For this reason, the review strived to cover the broadest possible range of techniques to provide readers with the fundamental elements necessary to make informed choices and design the most effective possible strategy to accomplish the task at hand.


Subject(s)
RNA Processing, Post-Transcriptional , RNA , Humans , RNA/genetics , Sequence Analysis, RNA/methods
12.
Brain ; 147(1): 281-296, 2024 01 04.
Article in English | MEDLINE | ID: mdl-37721175

ABSTRACT

Congenital myasthenic syndromes (CMS) are a rare group of inherited disorders caused by gene defects associated with the neuromuscular junction and potentially treatable with commonly available medications such as acetylcholinesterase inhibitors and ß2 adrenergic receptor agonists. In this study, we identified and genetically characterized the largest cohort of CMS patients from India to date. Genetic testing of clinically suspected patients evaluated in a South Indian hospital during the period 2014-19 was carried out by standard diagnostic gene panel testing or using a two-step method that included hotspot screening followed by whole-exome sequencing. In total, 156 genetically diagnosed patients (141 families) were characterized and the mutational spectrum and genotype-phenotype correlation described. Overall, 87 males and 69 females were evaluated, with the age of onset ranging from congenital to fourth decade (mean 6.6 ± 9.8 years). The mean age at diagnosis was 19 ± 12.8 (1-56 years), with a mean diagnostic delay of 12.5 ± 9.9 (0-49 years). Disease-causing variants in 17 CMS-associated genes were identified in 132 families (93.6%), while in nine families (6.4%), variants in genes not associated with CMS were found. Overall, postsynaptic defects were most common (62.4%), followed by glycosylation defects (21.3%), synaptic basal lamina genes (4.3%) and presynaptic defects (2.8%). Other genes found to cause neuromuscular junction defects (DES, TEFM) in our cohort accounted for 2.8%. Among the individual CMS genes, the most commonly affected gene was CHRNE (39.4%), followed by DOK7 (14.4%), DPAGT1 (9.8%), GFPT1 (7.6%), MUSK (6.1%), GMPPB (5.3%) and COLQ (4.5%). We identified 22 recurrent variants in this study, out of which eight were found to be geographically specific to the Indian subcontinent. Apart from the known common CHRNE variants p.E443Kfs*64 (11.4%) and DOK7 p.A378Sfs*30 (9.3%), we identified seven novel recurrent variants specific to this cohort, including DPAGT1 p.T380I and DES c.1023+5G>A, for which founder haplotypes are suspected. This study highlights the geographic differences in the frequencies of various causative CMS genes and underlines the increasing significance of glycosylation genes (DPAGT1, GFPT1 and GMPPB) as a cause of neuromuscular junction defects. Myopathy and muscular dystrophy genes such as GMPPB and DES, presenting as gradually progressive limb girdle CMS, expand the phenotypic spectrum. The novel genes MACF1 and TEFM identified in this cohort add to the expanding list of genes with new mechanisms causing neuromuscular junction defects.


Subject(s)
Myasthenic Syndromes, Congenital , Male , Female , Humans , Child , Adolescent , Young Adult , Adult , Myasthenic Syndromes, Congenital/diagnosis , Acetylcholinesterase , Delayed Diagnosis , Neuromuscular Junction/genetics , Genetic Testing , Mutation/genetics
13.
Proc Natl Acad Sci U S A ; 119(22): e2116797119, 2022 05 31.
Article in English | MEDLINE | ID: mdl-35613054

ABSTRACT

Long-term memory formation relies on synaptic plasticity, neuronal activity-dependent gene transcription, and epigenetic modifications. Multiple studies have shown that HDAC inhibitor (HDACi) treatments can enhance individual aspects of these processes and thereby act as putative cognitive enhancers. However, their mode of action is not fully understood. In particular, it is unclear how systemic application of HDACis, which are devoid of substrate specificity, can target pathways that promote memory formation. In this study, we explore the electrophysiological, transcriptional, and epigenetic responses that are induced by CI-994, a class I HDACi, combined with contextual fear conditioning (CFC) in mice. We show that CI-994­mediated improvement of memory formation is accompanied by enhanced long-term potentiation in the hippocampus, a brain region recruited by CFC, but not in the striatum, a brain region not primarily implicated in fear learning. Furthermore, using a combination of bulk and single-cell RNA-sequencing, we find that, when paired with CFC, HDACi treatment engages synaptic plasticity-promoting gene expression more strongly in the hippocampus, specifically in the dentate gyrus (DG). Finally, using chromatin immunoprecipitation-sequencing (ChIP-seq) of DG neurons, we show that the combined action of HDACi application and conditioning is required to elicit enhancer histone acetylation in pathways that underlie improved memory performance. Together, these results indicate that systemic HDACi administration amplifies brain region-specific processes that are naturally induced by learning.


Subject(s)
Benzamides , Dentate Gyrus , Histone Deacetylase Inhibitors , Memory, Long-Term , Phenylenediamines , Animals , Benzamides/pharmacology , Cell Communication/drug effects , Dentate Gyrus/cytology , Dentate Gyrus/drug effects , Dentate Gyrus/physiology , Histone Deacetylase Inhibitors/pharmacology , Memory, Long-Term/drug effects , Mice , Neuronal Plasticity , Neurons/drug effects , Neurons/metabolism , Phenylenediamines/pharmacology , RNA-Seq , Single-Cell Analysis
14.
J Infect Dis ; 229(Supplement_2): S163-S171, 2024 Mar 26.
Article in English | MEDLINE | ID: mdl-37968965

ABSTRACT

BACKGROUND: In response to Mpox endemic and public health emergency, DCHHS aimed to develop NGS based techniques to streamline Mpox viral clade and lineage analysis. METHODS: The Mpox sequencing workflow started with DNA extraction and adapted Illumina's COVIDSeq assay using hMpox primer pools from Yale School of Public Health. Sequencing steps included cDNA amplification, tagmentation, PCR indexing, pooling libraries, sequencing on MiSeq, data analysis, and report generation. The bioinformatic analysis comprised read assembly and consensus sequence mapping to reference genomes and variant identification, and utilized pipelines including Illumina BaseSpace, NextClade, CLC Workbench, Terra.bio for data quality control (QC) and validation. RESULTS: In total, 171 mpox samples were sequenced using modified COVIDSeq workflow and QC metrics were assessed for read quality, depth, and coverage. Multiple analysis pipelines identified the West African clade IIb as the only clade during peak Mpox infection from July through October 2022. Analyses also indicated lineage B.1.2 as the dominant variant comprising the majority of Mpox viral genomes (77.7%), implying its geographical distribution in the United States. Viral sequences were uploaded to GISAID EpiPox. CONCLUSIONS: We developed NGS workflows to precisely detect and analyze mpox viral clade and lineages aiding in public health genomic surveillance.


Subject(s)
Mpox (monkeypox) , Humans , Genomics/methods , Computational Biology/methods , High-Throughput Nucleotide Sequencing/methods , Data Accuracy
15.
J Infect Dis ; 229(2): 443-447, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-37561039

ABSTRACT

Zika virus has been circulating in Thailand since 2002 through continuous but likely low-level circulation. Here, we describe an infection in a pregnant woman who traveled to Thailand and South America during her pregnancy. By combining phylogenetic analysis with the patient's travel history and her pregnancy timeline, we confirmed that she likely got infected in Thailand at the end of 2021. This imported case of microcephaly highlights that Zika virus circulation in the country still constitutes a health risk, even in a year of lower incidence. MAIN POINTS: Here we trace the origin of travel-acquired microcephaly to Thailand, providing additional evidence that pre-American lineages of Zika virus can harm the fetus and highlighting that Zika virus constitutes a health threat even in a year of lower incidence.


Subject(s)
Microcephaly , Pregnancy Complications, Infectious , Zika Virus Infection , Zika Virus , Humans , Pregnancy , Female , Zika Virus/genetics , Travel , Thailand/epidemiology , Phylogeny
16.
J Infect Dis ; 229(2): 507-516, 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-37787611

ABSTRACT

T-cell-based diagnostic tools identify pathogen exposure but lack differentiation between recent and historical exposures in acute infectious diseases. Here, T-cell receptor (TCR) RNA sequencing was performed on HLA-DR+/CD38+CD8+ T-cell subsets of hospitalized coronavirus disease 2019 (COVID-19) patients (n = 30) and healthy controls (n = 30; 10 of whom had previously been exposed to severe acute respiratory syndrome coronavirus 2 [SARS-CoV-2]). CDR3α and CDR3ß TCR regions were clustered separately before epitope specificity annotation using a database of SARS-CoV-2-associated CDR3α and CDR3ß sequences corresponding to >1000 SARS-CoV-2 epitopes. The depth of the SARS-CoV-2-associated CDR3α/ß sequences differentiated COVID-19 patients from the healthy controls with a receiver operating characteristic area under the curve of 0.84 ± 0.10. Hence, annotating TCR sequences of activated CD8+ T cells can be used to diagnose an acute viral infection and discriminate it from historical exposure. In essence, this work presents a new paradigm for applying the T-cell repertoire to accomplish TCR-based diagnostics.


Subject(s)
CD8-Positive T-Lymphocytes , COVID-19 , Humans , Receptors, Antigen, T-Cell/genetics , COVID-19/diagnosis , SARS-CoV-2 , T-Lymphocyte Subsets , Epitopes , Epitopes, T-Lymphocyte , COVID-19 Testing
17.
J Infect Dis ; 2024 Feb 14.
Article in English | MEDLINE | ID: mdl-38365889

ABSTRACT

Progressive multifocal leukoencephalopathy (PML) is a rare neurological condition associated with reactivation of dormant JC polyomavirus (JCPyV). In this study, we characterized gene expression and JCPyV rearrangements in PML brain tissue. Infection of white matter astrocytes and oligodendrocytes as well as occasional brain cortex neurons was shown. PML brain harbored exclusively rearranged JCPyV variants. Viral transcripts covered the whole genome on both strands. Strong differential expression of human genes associated with neuroinflammation, blood-brain-barrier permeability and neurodegenerative diseases was shown. Pathway analysis revealed wide immune activation in PML brain. The study provides novel insights into the pathogenesis of PML.

18.
BMC Bioinformatics ; 25(1): 238, 2024 Jul 13.
Article in English | MEDLINE | ID: mdl-39003441

ABSTRACT

MOTIVATION: Alignment of reads to a reference genome sequence is one of the key steps in the analysis of human whole-genome sequencing data obtained through Next-generation sequencing (NGS) technologies. The quality of the subsequent steps of the analysis, such as the results of clinical interpretation of genetic variants or the results of a genome-wide association study, depends on the correct identification of the position of the read as a result of its alignment. The amount of human NGS whole-genome sequencing data is constantly growing. There are a number of human genome sequencing projects worldwide that have resulted in the creation of large-scale databases of genetic variants of sequenced human genomes. Such information about known genetic variants can be used to improve the quality of alignment at the read alignment stage when analysing sequencing data obtained for a new individual, for example, by creating a genomic graph. While existing methods for aligning reads to a linear reference genome have high alignment speed, methods for aligning reads to a genomic graph have greater accuracy in variable regions of the genome. The development of a read alignment method that takes into account known genetic variants in the linear reference sequence index allows combining the advantages of both sets of methods. RESULTS: In this paper, we present the minimap2_index_modifier tool, which enables the construction of a modified index of a reference genome using known single nucleotide variants and insertions/deletions (indels) specific to a given human population. The use of the modified minimap2 index improves variant calling quality without modifying the bioinformatics pipeline and without significant additional computational overhead. Using the PrecisionFDA Truth Challenge V2 benchmark data (for HG002 short-read data aligned to the GRCh38 linear reference (GCA_000001405.15) with parameters k = 27 and w = 14) it was demonstrated that the number of false negative genetic variants decreased by more than 9500, and the number of false positives decreased by more than 7000 when modifying the index with genetic variants from the Human Pangenome Reference Consortium.


Subject(s)
Genetic Variation , Genome, Human , Whole Genome Sequencing , Humans , Whole Genome Sequencing/methods , Genetic Variation/genetics , High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide/genetics , Sequence Alignment/methods , Software , Algorithms , Genome-Wide Association Study/methods
19.
BMC Bioinformatics ; 25(1): 130, 2024 Mar 26.
Article in English | MEDLINE | ID: mdl-38532317

ABSTRACT

BACKGROUND: Recent improvements in sequencing technologies enabled detailed profiling of genomic features. These technologies mostly rely on short reads which are merged and compared to reference genome for variant identification. These operations should be done with computers due to the size and complexity of the data. The need for analysis software resulted in many programs for mapping, variant calling and annotation steps. Currently, most programs are either expensive enterprise software with proprietary code which makes access and verification very difficult or open-access programs that are mostly based on command-line operations without user interfaces and extensive documentation. Moreover, a high level of disagreement is observed among popular mapping and variant calling algorithms in multiple studies, which makes relying on a single algorithm unreliable. User-friendly open-source software tools that offer comparative analysis are an important need considering the growth of sequencing technologies. RESULTS: Here, we propose Comparative Sequencing Analysis Platform (COSAP), an open-source platform that provides popular sequencing algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis and their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. COSAP is developed as a workflow management system and designed to enhance cooperation among scientists with different backgrounds. It is publicly available at https://cosap.bio and https://github.com/MBaysanLab/cosap/ . The source code of the frontend and backend services can be found at https://github.com/MBaysanLab/cosap-webapi/ and https://github.com/MBaysanLab/cosap_frontend/ respectively. All services are packed as Docker containers as well. Pipelines that combine algorithms can be customized and new algorithms can be added with minimal coding through modular structure. CONCLUSIONS: COSAP simplifies and speeds up the process of DNA sequencing analyses providing commonly used algorithms for SNV, indel, structural variant calling, copy number variation, microsatellite instability and fusion analysis as well as their annotations. COSAP is packed with a fully functional user-friendly web interface and a backend server which allows full independent deployment for both individual and institutional scales. Standardized implementations of popular algorithms in a modular platform make comparisons much easier to assess the impact of alternative pipelines which is crucial in establishing reproducibility of sequencing analyses.


Subject(s)
DNA Copy Number Variations , Microsatellite Instability , Humans , Reproducibility of Results , High-Throughput Nucleotide Sequencing/methods , Software
20.
J Biol Chem ; 299(6): 104831, 2023 06.
Article in English | MEDLINE | ID: mdl-37201587

ABSTRACT

Viral proteases play key roles in viral replication, and they also facilitate immune escape by proteolyzing diverse target proteins. Deep profiling of viral protease substrates in host cells is beneficial for understanding viral pathogenesis and for antiviral drug discovery. Here, we utilized substrate phage display coupled with protein network analysis to identify human proteome substrates of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) viral proteases, including papain-like protease (PLpro) and 3C-like protease (3CLpro). We first performed peptide substrates selection of PLpro and 3CLpro, and we then used the top 24 preferred substrate sequences to identify a total of 290 putative protein substrates. Protein network analysis revealed that the top clusters of PLpro and 3CLpro substrate proteins contain ubiquitin-related proteins and cadherin-related proteins, respectively. We verified that cadherin-6 and cadherin-12 are novel substrates of 3CLpro, and CD177 is a novel substrate of PLpro using in vitro cleavage assays. We thus demonstrated that substrate phage display coupled with protein network analysis is a simple and high throughput method to identify human proteome substrates of SARS-CoV-2 viral proteases for further understanding of virus-host interactions.


Subject(s)
COVID-19 , SARS-CoV-2 , Viral Proteases , Humans , Peptide Hydrolases/metabolism , Proteome , SARS-CoV-2/enzymology , SARS-CoV-2/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL