Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 10.257
Filter
Add more filters

Publication year range
1.
Nature ; 624(7991): 355-365, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38092919

ABSTRACT

Single-cell analyses parse the brain's billions of neurons into thousands of 'cell-type' clusters residing in different brain structures1. Many cell types mediate their functions through targeted long-distance projections allowing interactions between specific cell types. Here we used epi-retro-seq2 to link single-cell epigenomes and cell types to long-distance projections for 33,034 neurons dissected from 32 different regions projecting to 24 different targets (225 source-to-target combinations) across the whole mouse brain. We highlight uses of these data for interrogating principles relating projection types to transcriptomics and epigenomics, and for addressing hypotheses about cell types and connections related to genetics. We provide an overall synthesis with 926 statistical comparisons of discriminability of neurons projecting to each target for every source. We integrate this dataset into the larger BRAIN Initiative Cell Census Network atlas, composed of millions of neurons, to link projection cell types to consensus clusters. Integration with spatial transcriptomics further assigns projection-enriched clusters to smaller source regions than the original dissections. We exemplify this by presenting in-depth analyses of projection neurons from the hypothalamus, thalamus, hindbrain, amygdala and midbrain to provide insights into properties of those cell types, including differentially expressed genes, their associated cis-regulatory elements and transcription-factor-binding motifs, and neurotransmitter use.


Subject(s)
Brain , Epigenomics , Neural Pathways , Neurons , Animals , Mice , Amygdala , Brain/cytology , Brain/metabolism , Consensus Sequence , Datasets as Topic , Gene Expression Profiling , Hypothalamus/cytology , Mesencephalon/cytology , Neural Pathways/cytology , Neurons/metabolism , Neurotransmitter Agents/metabolism , Regulatory Sequences, Nucleic Acid , Rhombencephalon/cytology , Single-Cell Analysis , Thalamus/cytology , Transcription Factors/metabolism
2.
Nature ; 624(7991): 433-441, 2023 Dec.
Article in English | MEDLINE | ID: mdl-38030726

ABSTRACT

FOXP3 is a transcription factor that is essential for the development of regulatory T cells, a branch of T cells that suppress excessive inflammation and autoimmunity1-5. However, the molecular mechanisms of FOXP3 remain unclear. Here we here show that FOXP3 uses the forkhead domain-a DNA-binding domain that is commonly thought to function as a monomer or dimer-to form a higher-order multimer after binding to TnG repeat microsatellites. The cryo-electron microscopy structure of FOXP3 in a complex with T3G repeats reveals a ladder-like architecture, whereby two double-stranded DNA molecules form the two 'side rails' bridged by five pairs of FOXP3 molecules, with each pair forming a 'rung'. Each FOXP3 subunit occupies TGTTTGT within the repeats in a manner that is indistinguishable from that of FOXP3 bound to the forkhead consensus motif (TGTTTAC). Mutations in the intra-rung interface impair TnG repeat recognition, DNA bridging and the cellular functions of FOXP3, all without affecting binding to the forkhead consensus motif. FOXP3 can tolerate variable inter-rung spacings, explaining its broad specificity for TnG-repeat-like sequences in vivo and in vitro. Both FOXP3 orthologues and paralogues show similar TnG repeat recognition and DNA bridging. These findings therefore reveal a mode of DNA recognition that involves transcription factor homomultimerization and DNA bridging, and further implicates microsatellites in transcriptional regulation and diseases.


Subject(s)
DNA , Forkhead Transcription Factors , Microsatellite Repeats , Base Sequence , Consensus Sequence , Cryoelectron Microscopy , DNA/chemistry , DNA/genetics , DNA/metabolism , DNA/ultrastructure , Forkhead Transcription Factors/chemistry , Forkhead Transcription Factors/metabolism , Forkhead Transcription Factors/ultrastructure , Microsatellite Repeats/genetics , Mutation , Nucleotide Motifs , Protein Domains , Protein Multimerization , T-Lymphocytes, Regulatory/metabolism
3.
Nature ; 600(7887): 164-169, 2021 12.
Article in English | MEDLINE | ID: mdl-34789875

ABSTRACT

In the clades of animals that diverged from the bony fish, a group of Mas-related G-protein-coupled receptors (MRGPRs) evolved that have an active role in itch and allergic signals1,2. As an MRGPR, MRGPRX2 is known to sense basic secretagogues (agents that promote secretion) and is involved in itch signals and eliciting pseudoallergic reactions3-6. MRGPRX2 has been targeted by drug development efforts to prevent the side effects induced by certain drugs or to treat allergic diseases. Here we report a set of cryo-electron microscopy structures of the MRGPRX2-Gi1 trimer in complex with polycationic compound 48/80 or with inflammatory peptides. The structures of the MRGPRX2-Gi1 complex exhibited shallow, solvent-exposed ligand-binding pockets. We identified key common structural features of MRGPRX2 and describe a consensus motif for peptidic allergens. Beneath the ligand-binding pocket, the unusual kink formation at transmembrane domain 6 (TM6) and the replacement of the general toggle switch from Trp6.48 to Gly6.48 (superscript annotations as per Ballesteros-Weinstein nomenclature) suggest a distinct activation process. We characterized the interfaces of MRGPRX2 and the Gi trimer, and mapped the residues associated with key single-nucleotide polymorphisms on both the ligand and G-protein interfaces of MRGPRX2. Collectively, our results provide a structural basis for the sensing of cationic allergens by MRGPRX2, potentially facilitating the rational design of therapies to prevent unwanted pseudoallergic reactions.


Subject(s)
Nerve Tissue Proteins/chemistry , Nerve Tissue Proteins/metabolism , Pruritus/metabolism , Receptors, G-Protein-Coupled/chemistry , Receptors, G-Protein-Coupled/metabolism , Receptors, Neuropeptide/chemistry , Receptors, Neuropeptide/metabolism , Allergens/immunology , Amino Acid Motifs , Amino Acid Sequence , Binding Sites , Consensus Sequence , Cryoelectron Microscopy , GTP-Binding Protein alpha Subunits, Gi-Go/metabolism , GTP-Binding Protein alpha Subunits, Gq-G11/metabolism , Humans , Models, Molecular , Nerve Tissue Proteins/immunology , Nerve Tissue Proteins/ultrastructure , Receptors, G-Protein-Coupled/immunology , Receptors, G-Protein-Coupled/ultrastructure , Receptors, Neuropeptide/immunology , Receptors, Neuropeptide/ultrastructure
4.
Mol Cell ; 73(6): 1232-1242.e4, 2019 03 21.
Article in English | MEDLINE | ID: mdl-30765194

ABSTRACT

The C-terminal domain (CTD) of RNA polymerase II (Pol II) is composed of repeats of the consensus YSPTSPS and is an essential binding scaffold for transcription-associated factors. Metazoan CTDs have well-conserved lengths and sequence compositions arising from the evolution of divergent motifs, features thought to be essential for development. On the contrary, we show that a truncated CTD composed solely of YSPTSPS repeats supports Drosophila viability but that a CTD with enough YSPTSPS repeats to match the length of the wild-type Drosophila CTD is defective. Furthermore, a fluorescently tagged CTD lacking the rest of Pol II dynamically enters transcription compartments, indicating that the CTD functions as a signal sequence. However, CTDs with too many YSPTSPS repeats are more prone to localize to static nuclear foci separate from the chromosomes. We propose that the sequence complexity of the CTD offsets aberrant behavior caused by excessive repetitive sequences without compromising its targeting function.


Subject(s)
Amino Acid Motifs , Consensus Sequence , Drosophila Proteins/metabolism , Drosophila melanogaster/enzymology , RNA Polymerase II/metabolism , Repetitive Sequences, Amino Acid , Salivary Glands/enzymology , Animals , Animals, Genetically Modified , Drosophila Proteins/chemistry , Drosophila Proteins/genetics , Drosophila melanogaster/embryology , Drosophila melanogaster/genetics , Gene Expression Regulation, Developmental , Mutation , Protein Domains , RNA Polymerase II/chemistry , RNA Polymerase II/genetics , Salivary Glands/embryology , Transcription, Genetic , Transcriptional Activation
5.
Proc Natl Acad Sci U S A ; 121(3): e2312029121, 2024 Jan 16.
Article in English | MEDLINE | ID: mdl-38194446

ABSTRACT

Understanding natural protein evolution and designing novel proteins are motivating interest in development of high-throughput methods to explore large sequence spaces. In this work, we demonstrate the application of multisite λ dynamics (MSλD), a rigorous free energy simulation method, and chemical denaturation experiments to quantify evolutionary selection pressure from sequence-stability relationships and to address questions of design. This study examines a mesophilic phylogenetic clade of ribonuclease H (RNase H), furthering its extensive characterization in earlier studies, focusing on E. coli RNase H (ecRNH) and a more stable consensus sequence (AncCcons) differing at 15 positions. The stabilities of 32,768 chimeras between these two sequences were computed using the MSλD framework. The most stable and least stable chimeras were predicted and tested along with several other sequences, revealing a designed chimera with approximately the same stability increase as AncCcons, but requiring only half the mutations. Comparing the computed stabilities with experiment for 12 sequences reveals a Pearson correlation of 0.86 and root mean squared error of 1.18 kcal/mol, an unprecedented level of accuracy well beyond less rigorous computational design methods. We then quantified selection pressure using a simple evolutionary model in which sequences are selected according to the Boltzmann factor of their stability. Selection temperatures from 110 to 168 K are estimated in three ways by comparing experimental and computational results to evolutionary models. These estimates indicate selection pressure is high, which has implications for evolutionary dynamics and for the accuracy required for design, and suggests accurate high-throughput computational methods like MSλD may enable more effective protein design.


Subject(s)
Escherichia coli , Ribonuclease H , Escherichia coli/genetics , Phylogeny , Computer Simulation , Consensus Sequence , Ribonuclease H/genetics
6.
Brief Bioinform ; 25(4)2024 May 23.
Article in English | MEDLINE | ID: mdl-38920083

ABSTRACT

This study proposes a novel approach to studying severe acute respiratory syndrome coronavirus 2 virus mutations through sequencing data comparison. Traditional consensus-based methods, which focus on the most common nucleotide at each position, might overlook or obscure the presence of low-frequency variants. Our method, in contrast, retains all sequenced nucleotides at each position, forming a genomic matrix. Utilizing simulated short reads from genomes with specified mutations, we contrasted our genomic matrix approach with the consensus sequence method. Our matrix methodology, across multiple simulated datasets, accurately reflected the known mutations with an average accuracy improvement of 20% over the consensus method. In real-world tests using data from GISAID and NCBI-SRA, our approach demonstrated an increase in reliability by reducing the error margin by approximately 15%. The genomic matrix approach offers a more accurate representation of the viral genomic diversity, thereby providing superior insights into virus evolution and epidemiology.


Subject(s)
COVID-19 , Genome, Viral , Phylogeny , SARS-CoV-2 , SARS-CoV-2/genetics , Humans , COVID-19/virology , COVID-19/epidemiology , Mutation , Consensus Sequence , Genetic Variation
7.
Nature ; 583(7818): 729-736, 2020 07.
Article in English | MEDLINE | ID: mdl-32728250

ABSTRACT

Combinatorial binding of transcription factors to regulatory DNA underpins gene regulation in all organisms. Genetic variation in regulatory regions has been connected with diseases and diverse phenotypic traits1, but it remains challenging to distinguish variants that affect regulatory function2. Genomic DNase I footprinting enables the quantitative, nucleotide-resolution delineation of sites of transcription factor occupancy within native chromatin3-6. However, only a small fraction of such sites have been precisely resolved on the human genome sequence6. Here, to enable comprehensive mapping of transcription factor footprints, we produced high-density DNase I cleavage maps from 243 human cell and tissue types and states and integrated these data to delineate about 4.5 million compact genomic elements that encode transcription factor occupancy at nucleotide resolution. We map the fine-scale structure within about 1.6 million DNase I-hypersensitive sites and show that the overwhelming majority are populated by well-spaced sites of single transcription factor-DNA interaction. Cell-context-dependent cis-regulation is chiefly executed by wholesale modulation of accessibility at regulatory DNA rather than by differential transcription factor occupancy within accessible elements. We also show that the enrichment of genetic variants associated with diseases or phenotypic traits in regulatory regions1,7 is almost entirely attributable to variants within footprints, and that functional variants that affect transcription factor occupancy are nearly evenly partitioned between loss- and gain-of-function alleles. Unexpectedly, we find increased density of human genetic variation within transcription factor footprints, revealing an unappreciated driver of cis-regulatory evolution. Our results provide a framework for both global and nucleotide-precision analyses of gene regulatory mechanisms and functional genetic variation.


Subject(s)
DNA Footprinting/standards , Genome, Human/genetics , Transcription Factors/metabolism , Consensus Sequence , DNA/genetics , DNA/metabolism , Deoxyribonuclease I/metabolism , Genetics, Population , Genome-Wide Association Study , Humans , Models, Molecular , Polymorphism, Single Nucleotide , Regulatory Sequences, Nucleic Acid/genetics
8.
Nature ; 585(7825): 459-463, 2020 09.
Article in English | MEDLINE | ID: mdl-32908305

ABSTRACT

The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to the initiation of DNA transcription1-5, but the downstream core promoter in humans has been difficult to understand1-3. Here we analyse the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants, each with known transcriptional strength. We then analysed the HARPE data by support vector regression (SVR) to provide comprehensive models for the sequence motifs, and found that the SVR-based approach is more effective than a consensus-based method for predicting transcriptional activity. These results show that the DPR is a functionally important core promoter element that is widely used in human promoters. Notably, there appears to be a duality between the DPR and the TATA box, as many promoters contain one or the other element. More broadly, these findings show that functional DNA motifs can be identified by machine learning analysis of a comprehensive set of sequence variants.


Subject(s)
Consensus Sequence/genetics , Gene Expression Regulation/genetics , Promoter Regions, Genetic/genetics , RNA Polymerase II/metabolism , Support Vector Machine , Transcription, Genetic , Base Sequence , Cells/metabolism , Computer Simulation , Datasets as Topic , HeLa Cells , High-Throughput Nucleotide Sequencing , Humans , Models, Genetic , Mutagenesis , TATA Box/genetics
9.
Nature ; 580(7802): 269-273, 2020 04.
Article in English | MEDLINE | ID: mdl-32106218

ABSTRACT

Various species of the intestinal microbiota have been associated with the development of colorectal cancer1,2, but it has not been demonstrated that bacteria have a direct role in the occurrence of oncogenic mutations. Escherichia coli can carry the pathogenicity island pks, which encodes a set of enzymes that synthesize colibactin3. This compound is believed to alkylate DNA on adenine residues4,5 and induces double-strand breaks in cultured cells3. Here we expose human intestinal organoids to genotoxic pks+ E. coli by repeated luminal injection over five months. Whole-genome sequencing of clonal organoids before and after this exposure revealed a distinct mutational signature that was absent from organoids injected with isogenic pks-mutant bacteria. The same mutational signature was detected in a subset of 5,876 human cancer genomes from two independent cohorts, predominantly in colorectal cancer. Our study describes a distinct mutational signature in colorectal cancer and implies that the underlying mutational process results directly from past exposure to bacteria carrying the colibactin-producing pks pathogenicity island.


Subject(s)
Colorectal Neoplasms/genetics , Colorectal Neoplasms/microbiology , Escherichia coli/genetics , Escherichia coli/pathogenicity , Genomic Islands/genetics , Mutagenesis , Mutation , Coculture Techniques , Cohort Studies , Consensus Sequence , DNA Damage , Gastrointestinal Microbiome , Humans , Organoids/cytology , Organoids/metabolism , Organoids/microbiology , Peptides/genetics , Polyketides
10.
Nature ; 578(7794): 311-316, 2020 02.
Article in English | MEDLINE | ID: mdl-31996847

ABSTRACT

PIWI-interacting RNAs (piRNAs) of between approximately 24 and 31 nucleotides in length guide PIWI proteins to silence transposons in animal gonads, thereby ensuring fertility1. In the biogenesis of piRNAs, PIWI proteins are first loaded with 5'-monophosphorylated RNA fragments called pre-pre-piRNAs, which then undergo endonucleolytic cleavage to produce pre-piRNAs1,2. Subsequently, the 3'-ends of pre-piRNAs are trimmed by the exonuclease Trimmer (PNLDC1 in mouse)3-6 and 2'-O-methylated by the methyltransferase Hen1 (HENMT1 in mouse)7-9, generating mature piRNAs. It is assumed that the endonuclease Zucchini (MitoPLD in mouse) is a major enzyme catalysing the cleavage of pre-pre-piRNAs into pre-piRNAs10-13. However, direct evidence for this model is lacking, and how pre-piRNAs are generated remains unclear. Here, to analyse pre-piRNA production, we established a Trimmer-knockout silkworm cell line and derived a cell-free system that faithfully recapitulates Zucchini-mediated cleavage of PIWI-loaded pre-pre-piRNAs. We found that pre-piRNAs are generated by parallel Zucchini-dependent and -independent mechanisms. Cleavage by Zucchini occurs at previously unrecognized consensus motifs on pre-pre-piRNAs, requires the RNA helicase Armitage, and is accompanied by 2'-O-methylation of pre-piRNAs. By contrast, slicing of pre-pre-piRNAs with weak Zucchini motifs is achieved by downstream complementary piRNAs, producing pre-piRNAs without 2'-O-methylation. Regardless of the endonucleolytic mechanism, pre-piRNAs are matured by Trimmer and Hen1. Our findings highlight multiplexed processing of piRNA precursors that supports robust and flexible piRNA biogenesis.


Subject(s)
Amino Acid Motifs , Consensus Sequence , Insect Proteins/chemistry , Insect Proteins/metabolism , Mitochondrial Proteins/chemistry , Mitochondrial Proteins/metabolism , Phospholipase D/chemistry , Phospholipase D/metabolism , RNA, Small Interfering/biosynthesis , Adenosine Triphosphate/metabolism , Animals , Base Sequence , Bombyx , Cell Line , Cell-Free System , Gene Knockout Techniques , Insect Proteins/genetics , Methylation , Mice , RNA Helicases/metabolism
11.
Mol Cell ; 72(3): 482-495.e7, 2018 11 01.
Article in English | MEDLINE | ID: mdl-30388410

ABSTRACT

Productive splicing of human precursor messenger RNAs (pre-mRNAs) requires the correct selection of authentic splice sites (SS) from the large pool of potential SS. Although SS consensus sequence and splicing regulatory proteins are known to influence SS usage, the mechanisms ensuring the effective suppression of cryptic SS are insufficiently explored. Here, we find that many aberrant exonic SS are efficiently silenced by the exon junction complex (EJC), a multi-protein complex that is deposited on spliced mRNA near the exon-exon junction. Upon depletion of EJC proteins, cryptic SS are de-repressed, leading to the mis-splicing of a broad set of mRNAs. Mechanistically, the EJC-mediated recruitment of the splicing regulator RNPS1 inhibits cryptic 5'SS usage, while the deposition of the EJC core directly masks reconstituted 3'SS, thereby precluding transcript disintegration. Thus, the EJC protects the transcriptome of mammalian cells from inadvertent loss of exonic sequences and safeguards the expression of intact, full-length mRNAs.


Subject(s)
Alternative Splicing/physiology , Exons/physiology , RNA Splice Sites/physiology , Consensus Sequence/genetics , DEAD-box RNA Helicases/metabolism , Eukaryotic Initiation Factor-4A/metabolism , HeLa Cells , Humans , Introns , RNA Precursors/physiology , RNA Splicing/physiology , RNA, Messenger/genetics , RNA-Binding Proteins/metabolism , Ribonucleoproteins/metabolism , Transcriptome/genetics
12.
Proc Natl Acad Sci U S A ; 120(29): e2220762120, 2023 07 18.
Article in English | MEDLINE | ID: mdl-37432995

ABSTRACT

Large datasets contribute new insights to subjects formerly investigated by exemplars. We used coevolution data to create a large, high-quality database of transmembrane ß-barrels (TMBB). By applying simple feature detection on generated evolutionary contact maps, our method (IsItABarrel) achieves 95.88% balanced accuracy when discriminating among protein classes. Moreover, comparison with IsItABarrel revealed a high rate of false positives in previous TMBB algorithms. In addition to being more accurate than previous datasets, our database (available online) contains 1,938,936 bacterial TMBB proteins from 38 phyla, respectively, 17 and 2.2 times larger than the previous sets TMBB-DB and OMPdb. We anticipate that due to its quality and size, the database will serve as a useful resource where high-quality TMBB sequence data are required. We found that TMBBs can be divided into 11 types, three of which have not been previously reported. We find tremendous variance in proteome percentage among TMBB-containing organisms with some using 6.79% of their proteome for TMBBs and others using as little as 0.27% of their proteome. The distribution of the lengths of the TMBBs is suggestive of previously hypothesized duplication events. In addition, we find that the C-terminal ß-signal varies among different classes of bacteria though its consensus sequence is LGLGYRF. However, this ß-signal is only characteristic of prototypical TMBBs. The ten non-prototypical barrel types have other C-terminal motifs, and it remains to be determined if these alternative motifs facilitate TMBB insertion or perform any other signaling function.


Subject(s)
Algorithms , Proteome , Humans , Bacterial Proteins/genetics , Biological Evolution , Consensus Sequence
13.
Genes Dev ; 31(1): 1-2, 2017 01 01.
Article in English | MEDLINE | ID: mdl-28130343

ABSTRACT

Transcription by RNA polymerase II (Pol II) is dictated in part by core promoter elements, which are DNA sequences flanking the transcription start site (TSS) that help direct the proper initiation of transcription. Taking advantage of recent advances in genome-wide sequencing approaches, Vo ngoc and colleagues (pp. 6-11) identified transcripts with focused sites of initiation and found that many were transcribed from promoters containing a new consensus sequence for the human initiator (Inr) core promoter element.


Subject(s)
Promoter Regions, Genetic , Transcription Initiation Site , Base Sequence , Consensus Sequence , Humans , RNA Polymerase II/genetics , TATA Box , Transcription, Genetic
14.
Biochemistry ; 63(3): 348-354, 2024 Feb 06.
Article in English | MEDLINE | ID: mdl-38206322

ABSTRACT

Proteins' extraordinary performance in recognition and catalysis has led to their use in a range of applications. However, proteins obtained from natural sources are oftentimes not suitable for direct use in industrial or diagnostic setups. Natural proteins, evolved to optimally perform a task in physiological conditions, usually lack the stability required to be used in harsher conditions. Therefore, the alteration of the stability of proteins is commonly pursued in protein engineering studies. Here, we achieved a substantial thermal stabilization of a bacterial Zn(II)-dependent phospholipase C by consensus sequence design. We retrieved and analyzed sequenced homologues from different sources, selecting a subset of examples for expression and characterization. A non-natural consensus sequence showed the highest stability and activity among those tested. Comparison of the stability parameters of this stabilized mutant and other natural variants bearing similar mutations allows us to pinpoint the sites most likely to be responsible for the enhancement. Point mutations in these sites alter the unfolding process of the consensus sequence. We show that the stabilized version of the protein retains full activity even in harsh oil degumming conditions, making it suitable for industrial applications.


Subject(s)
Proteins , Zinc , Amino Acid Sequence , Proteins/metabolism , Mutation , Consensus Sequence
15.
BMC Genomics ; 25(1): 109, 2024 Jan 24.
Article in English | MEDLINE | ID: mdl-38267856

ABSTRACT

BACKGROUND: Despite the many cheap and fast ways to generate genomic data, good and exact genome assembly is still a problem, with especially the repeats being vastly underrepresented and often misassembled. As short reads in low coverage are already sufficient to represent the repeat landscape of any given genome, many read cluster algorithms were brought forward that provide repeat identification and classification. But how can trustworthy, reliable and representative repeat consensuses be derived from unassembled genomes? RESULTS: Here, we combine methods from repeat identification and genome assembly to derive these robust consensuses. We test several use cases, such as (1) consensus building from clustered short reads of non-model genomes, (2) from genome-wide amplification setups, and (3) specific repeat-centred questions, such as the linked vs. unlinked arrangement of ribosomal genes. In all our use cases, the derived consensuses are robust and representative. To evaluate overall performance, we compare our high-fidelity repeat consensuses to RepeatExplorer2-derived contigs and check, if they represent real transposable elements as found in long reads. Our results demonstrate that it is possible to generate useful, reliable and trustworthy consensuses from short reads by a combination from read cluster and genome assembly methods in an automatable way. CONCLUSION: We anticipate that our workflow opens the way towards more efficient and less manual repeat characterization and annotation, benefitting all genome studies, but especially those of non-model organisms.


Subject(s)
Algorithms , DNA Transposable Elements , Consensus Sequence , Cluster Analysis , Genomics
16.
Vet Res ; 55(1): 28, 2024 Mar 06.
Article in English | MEDLINE | ID: mdl-38449049

ABSTRACT

The prevalence of porcine reproductive and respiratory syndrome virus 1 (PRRSV1) isolates has continued to increase in Chinese swine herds in recent years. However, no effective control strategy is available for PRRSV1 infection in China. In this study, we generated the first infectious cDNA clone (rHLJB1) of a Chinese PRRSV1 isolate and subsequently used it as a backbone to construct an ORF2-6 chimeric virus (ORF2-6-CON). This virus contained a synthesized consensus sequence of the PRRSV1 ORF2-6 gene encoding all the envelope proteins. The ORF2-6 consensus sequence shared > 90% nucleotide similarity with four representative strains (Amervac, BJEU06-1, HKEU16 and NMEU09-1) of PRRSV1 in China. ORF2-6-CON had replication efficacy similar to that of the backbone rHLJB1 virus in primary alveolar macrophages (PAMs) and exhibited cell tropism in Marc-145 cells. Piglet inoculation and challenge studies indicated that ORF2-6-CON is not pathogenic to piglets and can induce enhanced cross-protection against a heterologous SD1291 isolate. Notably, ORF2-6-CON inoculation induced higher levels of heterologous neutralizing antibodies (nAbs) against SD1291 than rHLJB1 inoculation, which was concurrent with a higher percentage of T follicular helper (Tfh) cells in tracheobronchial lymph nodes (TBLNs), providing the first clue that porcine Tfh cells are correlated with heterologous PRRSV nAb responses. The number of SD1291-strain-specific IFNγ-secreting cells was similar in ORF2-6-CON-inoculated and rHLJB1-inoculated pigs. Overall, our findings support that the Marc-145-adapted ORF2-6-CON can trigger Tfh cell and heterologous nAb responses to confer improved cross-protection and may serve as a candidate strain for the development of a cross-protective PRRSV1 vaccine.


Subject(s)
Porcine respiratory and reproductive syndrome virus , Animals , Swine , Porcine respiratory and reproductive syndrome virus/genetics , T Follicular Helper Cells , Antibodies, Neutralizing , China , Consensus Sequence
17.
Nucleic Acids Res ; 50(D1): D371-D379, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34761274

ABSTRACT

Previous studies on enhancers and their target genes were largely based on bulk samples that represent 'average' regulatory activities from a large population of millions of cells, masking the heterogeneity and important effects from the sub-populations. In recent years, single-cell sequencing technology has enabled the profiling of open chromatin accessibility at the single-cell level (scATAC-seq), which can be used to annotate the enhancers and promoters in specific cell types. A comprehensive resource is highly desirable for exploring how the enhancers regulate the target genes at the single-cell level. Hence, we designed a single-cell database scEnhancer (http://enhanceratlas.net/scenhancer/), covering 14 527 776 enhancers and 63 658 600 enhancer-gene interactions from 1 196 906 single cells across 775 tissue/cell types in three species. An unsupervised learning method was employed to sort and combine tens or hundreds of single cells in each tissue/cell type to obtain the consensus enhancers. In addition, we utilized a cis-regulatory network algorithm to identify the enhancer-gene connections. Finally, we provided a user-friendly platform with seven useful modules to search, visualize, and browse the enhancers/genes. This database will facilitate the research community towards a functional analysis of enhancers at the single-cell level.


Subject(s)
Databases, Genetic , Enhancer Elements, Genetic , Single-Cell Analysis/methods , Software , Unsupervised Machine Learning , Animals , Cell Lineage/genetics , Chromatin/chemistry , Chromatin/metabolism , Consensus Sequence , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Eukaryotic Cells/cytology , Eukaryotic Cells/metabolism , Gene Expression Regulation , Gene Regulatory Networks , Genetic Heterogeneity , Humans , Internet , Mice , Molecular Sequence Annotation , Organ Specificity , Promoter Regions, Genetic
18.
Int J Mol Sci ; 25(3)2024 Jan 30.
Article in English | MEDLINE | ID: mdl-38338947

ABSTRACT

The extended cleavage specificities of two hematopoietic serine proteases originating from the ray-finned fish, the spotted gar (Lepisosteus oculatus), have been characterized using substrate phage display. The preference for particular amino acids at and surrounding the cleavage site was further validated using a panel of recombinant substrates. For one of the enzymes, the gar granzyme G, a strict preference for the aromatic amino acid Tyr was observed at the cleavable P1 position. Using a set of recombinant substrates showed that the gar granzyme G had a high selectivity for Tyr but a lower activity for cleaving after Phe but not after Trp. Instead, the second enzyme, gar DDN1, showed a high preference for Leu in the P1 position of substrates. This latter enzyme also showed a high preference for Pro in the P2 position and Arg in both P4 and P5 positions. The selectivity for the two Arg residues in positions P4 and P5 suggests a highly specific substrate selectivity of this enzyme. The screening of the gar proteome with the consensus sequences obtained by substrate phage display for these two proteases resulted in a very diverse set of potential targets. Due to this diversity, a clear candidate for a specific immune function of these two enzymes cannot yet be identified. Antisera developed against the recombinant gar enzymes were used to study their tissue distribution. Tissue sections from juvenile fish showed the expression of both proteases in cells in Peyer's patch-like structures in the intestinal region, indicating they may be expressed in T or NK cells. However, due to the lack of antibodies to specific surface markers in the gar, it has not been possible to specify the exact cellular origin. A marked difference in abundance was observed for the two proteases where gar DDN1 was expressed at higher levels than gar granzyme G. However, both appear to be expressed in the same or similar cells, having a lymphocyte-like appearance.


Subject(s)
Fishes , Serine Proteases , Animals , Serine Proteases/genetics , Granzymes , Endopeptidases , Consensus Sequence , Substrate Specificity
19.
J Biol Chem ; 298(8): 102129, 2022 08.
Article in English | MEDLINE | ID: mdl-35700824

ABSTRACT

Epidermal growth factor-like domains (EGFDs) have important functions in cell-cell signaling. Both secreted and cell surface human EGFDs are subject to extensive modifications, including aspartate and asparagine residue C3-hydroxylations catalyzed by the 2-oxoglutarate oxygenase aspartate/asparagine-ß-hydroxylase (AspH). Although genetic studies show AspH is important in human biology, studies on its physiological roles have been limited by incomplete knowledge of its substrates. Here, we redefine the consensus sequence requirements for AspH-catalyzed EGFD hydroxylation based on combined analysis of proteomic mass spectrometric data and mass spectrometry-based assays with isolated AspH and peptide substrates. We provide cellular and biochemical evidence that the preferred site of EGFD hydroxylation is embedded within a disulfide-bridged macrocycle formed of 10 amino acid residues. This definition enabled the identification of previously unassigned hydroxylation sites in three EGFDs of human fibulins as AspH substrates. A non-EGFD containing protein, lymphocyte antigen-6/plasminogen activator urokinase receptor domain containing protein 6B (LYPD6B) was shown to be a substrate for isolated AspH, but we did not observe evidence for LYPD6B hydroxylation in cells. AspH-catalyzed hydroxylation of fibulins is of particular interest given their important roles in extracellular matrix dynamics. In conclusion, these results lead to a revision of the consensus substrate requirements for AspH and expand the range of observed and potential AspH-catalyzed hydroxylation in cells, which will enable future study of the biological roles of AspH.


Subject(s)
Consensus Sequence , Epidermal Growth Factor , Proteomics , Antigens, Ly/metabolism , Asparagine/metabolism , Aspartic Acid/metabolism , Epidermal Growth Factor/metabolism , Humans , Hydroxylation
20.
J Virol ; 96(2): e0144421, 2022 01 26.
Article in English | MEDLINE | ID: mdl-34757836

ABSTRACT

The NIa protease of potyviruses is a chymotrypsin-like cysteine protease related to the picornavirus 3C protease. It is also a multifunctional protein known to play multiple roles during virus infection. Picornavirus 3C proteases cleave hundreds of host proteins to facilitate virus infection. However, whether or not potyvirus NIa proteases cleave plant proteins has so far not been tested. Regular expression search using the cleavage site consensus sequence [EQN]xVxH[QE]/[SGTA] for the plum pox virus (PPV) protease identified 90 to 94 putative cleavage events in the proteomes of Prunus persica (a crop severely affected by PPV), Arabidopsis thaliana, and Nicotiana benthamiana (two experimental hosts). In vitro processing assays confirmed cleavage of six A. thaliana and five P. persica proteins by the PPV protease. These proteins were also cleaved in vitro by the protease of turnip mosaic virus (TuMV), which has a similar specificity. We confirmed in vivo cleavage of a transiently expressed tagged version of AtEML2, an EMSY-like protein belonging to a family of nuclear histone readers known to be involved in pathogen resistance. Cleavage of AtEML2 was efficient and was observed in plants that coexpressed the PPV or TuMV NIa proteases or in plants that were infected with TuMV. We also showed partial in vivo cleavage of AtDUF707, a membrane protein annotated as lysine ketoglutarate reductase trans-splicing protein. Although cleavage of the corresponding endogenous plant proteins remains to be confirmed, the results show that a plant virus protease can cleave host proteins during virus infection and highlight a new layer of plant-virus interactions. IMPORTANCE Viruses are highly adaptive and use multiple molecular mechanisms to highjack or modify the cellular resources to their advantage. They must also counteract or evade host defense responses. One well-characterized mechanism used by vertebrate viruses is the proteolytic cleavage of host proteins to inhibit the activities of these proteins and/or to produce cleaved protein fragments that are beneficial to the virus infection cycle. Even though almost half of the known plant viruses encode at least one protease, it was not known whether plant viruses employ this strategy. Using an in silico prediction approach and the well-characterized specificity of potyvirus NIa proteases, we were able to identify hundreds of putative cleavage sites in plant proteins, several of which were validated by downstream experiments. It can be anticipated that many other plant virus proteases also cleave host proteins and that the identification of these cleavage events will lead to novel antiviral strategies.


Subject(s)
Endopeptidases/metabolism , Plant Proteins/metabolism , Potyvirus/enzymology , Viral Proteins/metabolism , Amino Acid Sequence , Arabidopsis/metabolism , Consensus Sequence , Endopeptidases/genetics , Host-Pathogen Interactions , Plant Diseases/virology , Plant Proteins/chemistry , Potyvirus/classification , Potyvirus/genetics , Proteolysis , Prunus persica/metabolism , Recombinant Proteins/genetics , Recombinant Proteins/metabolism , Substrate Specificity , Viral Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL