Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 873
Filtrar
1.
PLoS One ; 19(4): e0301069, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38669259

RESUMEN

Nearly 300 million individuals live with chronic hepatitis B virus (HBV) infection (CHB), for which no curative therapy is available. As viral diversity is associated with pathogenesis and immunological control of infection, improved methods to characterize this diversity could aid drug development efforts. Conventionally, viral sequencing data are mapped/aligned to a reference genome, and only the aligned sequences are retained for analysis. Thus, reference selection is critical, yet selecting the most representative reference a priori remains difficult. We investigate an alternative pangenome approach which can combine multiple reference sequences into a graph which can be used during alignment. Using simulated short-read sequencing data generated from publicly available HBV genomes and real sequencing data from an individual living with CHB, we demonstrate alignment to a phylogenetically representative 'genome graph' can improve alignment, avoid issues of reference ambiguity, and facilitate the construction of sample-specific consensus sequences more genetically similar to the individual's infection. Graph-based methods can, therefore, improve efforts to characterize the genetics of viral pathogens, including HBV, and have broader implications in host-pathogen research.


Asunto(s)
Secuencia de Consenso , Genoma Viral , Virus de la Hepatitis B , Virus de la Hepatitis B/genética , Humanos , Secuencia de Consenso/genética , Filogenia , Alineación de Secuencia/métodos , Variación Genética , Hepatitis B Crónica/virología , ADN Viral/genética , Análisis de Secuencia de ADN/métodos
2.
Cell Rep ; 38(7): 110364, 2022 02 15.
Artículo en Inglés | MEDLINE | ID: mdl-35172134

RESUMEN

Mesendodermal specification is one of the earliest events in embryogenesis, where cells first acquire distinct identities. Cell differentiation is a highly regulated process that involves the function of numerous transcription factors (TFs) and signaling molecules, which can be described with gene regulatory networks (GRNs). Cell differentiation GRNs are difficult to build because existing mechanistic methods are low throughput, and high-throughput methods tend to be non-mechanistic. Additionally, integrating highly dimensional data composed of more than two data types is challenging. Here, we use linked self-organizing maps to combine chromatin immunoprecipitation sequencing (ChIP-seq)/ATAC-seq with temporal, spatial, and perturbation RNA sequencing (RNA-seq) data from Xenopus tropicalis mesendoderm development to build a high-resolution genome scale mechanistic GRN. We recover both known and previously unsuspected TF-DNA/TF-TF interactions validated through reporter assays. Our analysis provides insights into transcriptional regulation of early cell fate decisions and provides a general approach to building GRNs using highly dimensional multi-omic datasets.


Asunto(s)
Endodermo/embriología , Redes Reguladoras de Genes , Genómica , Mesodermo/embriología , Xenopus/embriología , Xenopus/genética , Animales , Cromatina/metabolismo , Secuencia de Consenso/genética , ADN/metabolismo , Gastrulación/genética , Regulación del Desarrollo de la Expresión Génica , Unión Proteica , ARN/metabolismo , Factores de Transcripción/metabolismo , Transcripción Genética
3.
EBioMedicine ; 75: 103750, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-34922323

RESUMEN

BACKGROUND: Long non-coding RNAs (lncRNAs) have recently emerged as essential biomarkers of cancer progression. However, studies are limited regarding lncRNAs correlated with recurrence and fluorouracil-based adjuvant chemotherapy (ACT) in stage II/III colorectal cancer (CRC). METHODS: 1640 stage II/III CRC patients were enrolled from 15 independent datasets and a clinical in-house cohort. 10 prevalent machine learning algorithms were collected and then combined into 76 combinations. 109 published transcriptome signatures were also retrieved. qRT-PCR assay was performed to verify our model. FINDINGS: We comprehensively identified 27 stably recurrence-related lncRNAs from multi-center cohorts. According to these lncRNAs, a consensus machine learning-derived lncRNA signature (CMDLncS) that exhibited best power for predicting recurrence risk was determined from 76 kinds of algorithm combinations. A high CMDLncS indicated unfavorable recurrence and mortality rates. CMDLncS not only could work independently of common clinical traits (e.g., AJCC stage) and molecular features (e.g., microsatellite state, KRAS mutation), but also presented dramatically better performance than these variables. qRT-PCR results from 173 patients further verified our in-silico findings and assessed its feasible in different centers. Comparisons of CMDLncS with 109 published transcriptome signatures further demonstrated its predictive superiority. Additionally, patients with high CMDLncS benefited more from fluorouracil-based ACT and were characterized by activation of stromal and epithelial-mesenchymal transition, while patients with low CMDLncS suggested the sensitivity to bevacizumab and displayed enhanced immune activation. INTERPRETATION: CMDLncS provides an attractive platform for identifying patient at high risk of recurrence and could optimize precision treatment to improve the clinical outcomes in stage II/III CRC. FUNDING: This study was supported by the National Natural Science Foundation of China (81,972,663); Henan Province Young and Middle-Aged Health Science and Technology Innovation Talent Project (YXKC2020037); and Henan Provincial Health Commission Joint Youth Project (SB201902014).


Asunto(s)
Neoplasias Colorrectales , Secuencia de Consenso , ARN Largo no Codificante , Adolescente , Biomarcadores de Tumor/genética , Neoplasias Colorrectales/tratamiento farmacológico , Neoplasias Colorrectales/genética , Secuencia de Consenso/genética , Humanos , Aprendizaje Automático , Persona de Mediana Edad , Recurrencia Local de Neoplasia/genética , Pronóstico , ARN Largo no Codificante/genética
4.
Nat Protoc ; 16(7): 3625-3638, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34089018

RESUMEN

The most common nonstandard nucleotides found in genomic DNA are ribonucleotides. Although ribonucleotides are frequently incorporated into DNA by replicative DNA polymerases, very little is known about the distribution and signatures of ribonucleotides incorporated into DNA. Recent advances in high-throughput ribonucleotide sequencing can capture the exact locations of ribonucleotides in genomic DNA. Ribose-Map is a user-friendly, standardized bioinformatics toolkit for the comprehensive analysis of ribonucleotide sequencing experiments. It allows researchers to map the locations of ribonucleotides in DNA to single-nucleotide resolution and identify biological signatures of ribonucleotide incorporation. In addition, it can be applied to data generated using any currently available high-throughput ribonucleotide sequencing technique, thus standardizing the analysis of ribonucleotide sequencing experiments and allowing direct comparisons of results. This protocol describes in detail how to use Ribose-Map to analyze ribonucleotide sequencing data, including preparing the reads for analysis, locating the genomic coordinates of ribonucleotides, exploring the genome-wide distribution of ribonucleotides, determining the nucleotide sequence context of ribonucleotides and identifying hotspots of ribonucleotide incorporation. Ribose-Map does not require background knowledge of ribonucleotide sequencing analysis and assumes only basic command-line skills. The protocol requires less than 3 h of computing time for most datasets and ~30 min of hands-on time. Ribose-Map is available at https://github.com/agombolay/ribose-map .


Asunto(s)
ADN de Hongos/genética , Genoma , Genómica/métodos , Ribonucleótidos/genética , Ribosa/metabolismo , Saccharomyces cerevisiae/genética , Secuencia de Bases , Secuencia de Consenso/genética , ADN Mitocondrial/genética
5.
Mol Biol Rep ; 48(3): 2223-2233, 2021 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33689093

RESUMEN

TEOSINTE BRANCHED 1/CYCLOIDEA/PROLIFERATING CELL FACTOR 1 (TCP) transcription factors control multiple aspects of growth and development in various plant species. However, few genes were reported to be directly targeted and regulated by them through their specific binding sites, and then uncover their functions in plants. A consensus DNA-binding site motif of TCP2 was identified by random binding site selection (RBSS). DNA recognized by TCP2 contained the motif G(G/T)GGNCC(A/C), which showed high consistency with motifs bound by other TCP domain proteins. Consequently, this motif was regarded as the specific DNA-binding sites of TCP2. Circadian clock associated 1 (CCA1) and EARLY FLOWERING 3 (ELF3) were subsequently considered as potential target genes owing to the containing of the similar TCP2 binding sites or core binding sites GGNCC and found to be positively regulated by TCP2 via DNA binding. Phenotype analysis results showed that mutation and over-expression of TCP2 resulted in variations in leaf morphogenesis, especially the double or triple mutations of TCP2, 4 and 10. Mutations in TCPs caused late flowering. Finally, TCP2 was shown to influence hypocotyl elongation by mediating the jasmonate signaling pathway. Overall, these results provide a basis for future studies aimed at distinguishing the target genes of TCP2 and elucidating the important roles of TCP2 in plant growth and development.


Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/crecimiento & desarrollo , Arabidopsis/genética , Sitios de Unión/genética , Secuencia de Consenso/genética , ADN de Plantas/metabolismo , Factores de Transcripción/metabolismo , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/genética , Secuencia de Bases , Ciclopentanos/metabolismo , Flores/fisiología , Regulación de la Expresión Génica de las Plantas , Hipocótilo/crecimiento & desarrollo , Morfogénesis/genética , Mutación/genética , Oxilipinas/metabolismo , Hojas de la Planta/crecimiento & desarrollo , Unión Proteica , Dominios Proteicos , Transducción de Señal , Factores de Tiempo , Factores de Transcripción/química , Factores de Transcripción/genética
6.
Arch Virol ; 166(1): 43-64, 2021 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-33052487

RESUMEN

Leucine-rich repeats (LRRs) are present in over 563,000 proteins from viruses to eukaryotes. LRRs repeat in tandem and have been classified into fifteen classes in which the repeat unit lengths range from 20 to 29 residues. Most LRR proteins are involved in protein-protein or ligand interactions. The amount of genome sequence data from viruses is increasing rapidly, and although viral LRR proteins have been identified, a comprehensive sequence analysis has not yet been done, and their structures, functions, and evolution are still unknown. In the present study, we characterized viral LRRs by sequence analysis and identified over 600 LRR proteins from 89 virus species. Most of these proteins were from double-stranded DNA (dsDNA) viruses, including nucleocytoplasmic large dsDNA viruses (NCLDVs). We found that the repeating unit lengths of 11 types are one to five residues shorter than those of the seven known corresponding LRR classes. The repeating units of six types are 19 residues long and are thus the shortest among all LRRs. In addition, two of the LRR types are unique and have not been observed in bacteria, archae or eukaryotes. Conserved strongly hydrophobic residues such as Leu, Val or Ile in the consensus sequences are replaced by Cys with high frequency. Phylogenetic analysis indicated that horizontal gene transfer of some viral LRR genes had occurred between the virus and its host. We suggest that the shortening might contribute to the survival strategy of viruses. The present findings provide a new perspective on the origin and evolution of LRRs.


Asunto(s)
ADN/genética , Leucina/genética , Secuencias Repetitivas de Aminoácido/genética , Virus/genética , Archaea/virología , Bacterias/virología , Secuencia de Consenso/genética , Eucariontes/virología , Filogenia , Proteínas Virales/genética
7.
Nat Commun ; 11(1): 6023, 2020 11 26.
Artículo en Inglés | MEDLINE | ID: mdl-33243970

RESUMEN

The success of protein evolution campaigns is strongly dependent on the sequence context in which mutations are introduced, stemming from pervasive non-additive interactions between a protein's amino acids ('intra-gene epistasis'). Our limited understanding of such epistasis hinders the correct prediction of the functional contributions and adaptive potential of mutations. Here we present a straightforward unique molecular identifier (UMI)-linked consensus sequencing workflow (UMIC-seq) that simplifies mapping of evolutionary trajectories based on full-length sequences. Attaching UMIs to gene variants allows accurate consensus generation for closely related genes with nanopore sequencing. We exemplify the utility of this approach by reconstructing the artificial phylogeny emerging in three rounds of directed evolution of an amine dehydrogenase biocatalyst via ultrahigh throughput droplet screening. Uniquely, we are able to identify lineages and their founding variant, as well as non-additive interactions between mutations within a full gene showing sign epistasis. Access to deep and accurate long reads will facilitate prediction of key beneficial mutations and adaptive potential based on in silico analysis of large sequence datasets.


Asunto(s)
Evolución Molecular Dirigida , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Ensayos Analíticos de Alto Rendimiento/métodos , Oxidorreductasas actuantes sobre Donantes de Grupo CH-NH/genética , Ingeniería de Proteínas/métodos , Biocatálisis , Clonación Molecular , Biología Computacional/métodos , Secuencia de Consenso/genética , Conjuntos de Datos como Asunto , Pruebas de Enzimas , Epistasis Genética , Biblioteca de Genes , Mutagénesis , Mutación , Oxidorreductasas actuantes sobre Donantes de Grupo CH-NH/aislamiento & purificación , Oxidorreductasas actuantes sobre Donantes de Grupo CH-NH/metabolismo , Filogenia , Proteínas Recombinantes/genética , Proteínas Recombinantes/aislamiento & purificación , Proteínas Recombinantes/metabolismo , Programas Informáticos
8.
Front Cell Infect Microbiol ; 10: 575613, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33123498

RESUMEN

Background: The ongoing pandemic of SARS-COV-2 has already infected more than eight million people worldwide. The majority of COVID-19 patients either are asymptomatic or have mild symptoms. Yet, about 15% of the cases experience severe complications and require intensive care. Factors determining disease severity are not yet fully characterized. Aim: Here, we investigated the within-host virus diversity in COVID-19 patients with different clinical manifestations. Methods: We compared SARS-COV-2 genetic diversity in 19 mild and 27 severe cases. Viral RNA was extracted from nasopharyngeal samples and sequenced using the Illumina MiSeq platform. This was followed by deep-sequencing analyses of SARS-CoV-2 genomes at both consensus and sub-consensus sequence levels. Results: Consensus sequences of all viruses were very similar, showing more than 99.8% sequence identity regardless of the disease severity. However, the sub-consensus analysis revealed significant differences in within-host diversity between mild and severe cases. Patients with severe symptoms exhibited a significantly (p-value 0.001) higher number of variants in coding and non-coding regions compared to mild cases. Analysis also revealed higher prevalence of some variants among severe cases. Most importantly, severe cases exhibited significantly higher within-host diversity (mean = 13) compared to mild cases (mean = 6). Further, higher within-host diversity was observed in patients above the age of 60 compared to the younger age group. Conclusion: These observations provided evidence that within-host diversity might play a role in the development of severe disease outcomes in COVID-19 patients; however, further investigations are required to elucidate this association.


Asunto(s)
Betacoronavirus/clasificación , Betacoronavirus/genética , Variación Genética/genética , Genoma Viral/genética , Índice de Severidad de la Enfermedad , Adulto , Anciano , COVID-19 , Secuencia de Consenso/genética , Infecciones por Coronavirus/patología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Pandemias , Neumonía Viral/patología , ARN Viral/genética , Factores de Riesgo , SARS-CoV-2 , Análisis de Secuencia de ARN , Adulto Joven
9.
Curr Biol ; 30(22): 4454-4466.e5, 2020 11 16.
Artículo en Inglés | MEDLINE | ID: mdl-32976810

RESUMEN

Many protein-modifying enzymes recognize their substrates via docking motifs, but the range of functionally permissible motif sequences is often poorly defined. During eukaryotic cell division, cyclin-specific docking motifs help cyclin-dependent kinases (CDKs) phosphorylate different substrates at different stages, thus enforcing a temporally ordered series of events. In budding yeast, CDK substrates with Leu/Pro-rich (LP) docking motifs are recognized by Cln1/2 cyclins in late G1 phase, yet the key sequence features of these motifs were unknown. Here, we comprehensively analyze LP motif requirements in vivo by combining a competitive growth assay with deep mutational scanning. We quantified the effect of all single-residue replacements in five different LP motifs by using six distinct G1 cyclins from diverse fungi including medical and agricultural pathogens. The results uncover substantial tolerance for deviations from the consensus sequence, plus requirements at some positions that are contingent on the favorability of other motif residues. They also reveal the basis for variations in functional potency among wild-type motifs, and allow derivation of a quantitative matrix that predicts the strength of other candidate motif sequences. Finally, we find that variation in docking motif potency can advance or delay the time at which CDK substrate phosphorylation occurs, and thereby control the temporal ordering of cell cycle regulation. The overall results provide a general method for surveying viable docking motif sequences and quantifying their potency in vivo, and they reveal how variations in docking strength can tune the degree and timing of regulatory modifications.


Asunto(s)
Quinasas Ciclina-Dependientes/metabolismo , Ciclinas/genética , Fase G1 , Dominios Proteicos/genética , Proteínas de Saccharomyces cerevisiae/genética , Secuencias de Aminoácidos/genética , Secuencia de Consenso/genética , Ciclinas/metabolismo , Análisis Mutacional de ADN , ADN de Hongos/genética , ADN de Hongos/aislamiento & purificación , Fosforilación/genética , Unión Proteica/genética , Saccharomyces cerevisiae , Proteínas de Saccharomyces cerevisiae/metabolismo
10.
Nature ; 585(7825): 459-463, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32908305

RESUMEN

The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to the initiation of DNA transcription1-5, but the downstream core promoter in humans has been difficult to understand1-3. Here we analyse the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants, each with known transcriptional strength. We then analysed the HARPE data by support vector regression (SVR) to provide comprehensive models for the sequence motifs, and found that the SVR-based approach is more effective than a consensus-based method for predicting transcriptional activity. These results show that the DPR is a functionally important core promoter element that is widely used in human promoters. Notably, there appears to be a duality between the DPR and the TATA box, as many promoters contain one or the other element. More broadly, these findings show that functional DNA motifs can be identified by machine learning analysis of a comprehensive set of sequence variants.


Asunto(s)
Secuencia de Consenso/genética , Regulación de la Expresión Génica/genética , Regiones Promotoras Genéticas/genética , ARN Polimerasa II/metabolismo , Máquina de Vectores de Soporte , Transcripción Genética , Secuencia de Bases , Células/metabolismo , Simulación por Computador , Conjuntos de Datos como Asunto , Células HeLa , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Modelos Genéticos , Mutagénesis , TATA Box/genética
11.
Nat Commun ; 11(1): 3224, 2020 06 26.
Artículo en Inglés | MEDLINE | ID: mdl-32591528

RESUMEN

In plants, epigenetic regulation is critical for silencing transposons and maintaining proper gene expression. However, its impact on the genome-wide transcription initiation landscape remains elusive. By conducting a genome-wide analysis of transcription start sites (TSSs) using cap analysis of gene expression (CAGE) sequencing, we show that thousands of TSSs are exclusively activated in various epigenetic mutants of Arabidopsis thaliana and referred to as cryptic TSSs. Many have not been identified in previous studies, of which up to 65% are contributed by transposons. They possess similar genetic features to regular TSSs and their activation is strongly associated with the ectopic recruitment of RNAPII machinery. The activation of cryptic TSSs significantly alters transcription of nearby TSSs, including those of genes important for development and stress responses. Our study, therefore, sheds light on the role of epigenetic regulation in maintaining proper gene functions in plants by suppressing transcription from cryptic TSSs.


Asunto(s)
Arabidopsis/genética , Epigénesis Genética , Regulación de la Expresión Génica de las Plantas , Transcripción Genética , Secuencia de Bases , Secuencia de Consenso/genética , Metilación de ADN/genética , ADN Polimerasa beta/metabolismo , Elementos Transponibles de ADN/genética , Genes de Plantas , Mutación/genética , ARN Polimerasa II/metabolismo , Sitio de Iniciación de la Transcripción , Transcriptoma/genética
12.
Biotechnol Lett ; 42(8): 1305-1315, 2020 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-32430802

RESUMEN

Multiple sequence alignment (MSA) is a fundamental way to gain information that cannot be obtained from the analysis of any individual sequence included in the alignment. It provides ways to investigate the relationship between sequence and function from a perspective of evolution. Thus, the MSA of proteins can be employed as a reference for protein engineering. In this paper, we reviewed the recent advances to highlight how protein engineering was benefited from the MSA of proteins. These methods include (1) engineering the thermostability or solubility of proteins by making it closer to the consensus sequence of the alignment through introducing site mutations; (2) structure-based engineering proteins with comparative modeling; (3) creating paleoenzymes featured with high thermostability and promiscuity by constructing the ancestral sequences derived from multiple sequence alignment; and (4) incorporating site-mutations targeting the evolutionarily coupled sites identified from multiple sequence alignment.


Asunto(s)
Ingeniería de Proteínas/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos/genética , Secuencia de Consenso/genética , Mutación/genética , Estabilidad Proteica , Proteínas/química , Proteínas/genética , Proteínas/metabolismo
13.
Sci Rep ; 10(1): 6727, 2020 04 21.
Artículo en Inglés | MEDLINE | ID: mdl-32317695

RESUMEN

The biology of bacterial cells is, in general, based on information encoded on circular chromosomes. Regulation of chromosome replication is an essential process that mostly takes place at the origin of replication (oriC), a locus unique per chromosome. Identification of high numbers of oriC is a prerequisite for systematic studies that could lead to insights into oriC functioning as well as the identification of novel drug targets for antibiotic development. Current methods for identifying oriC sequences rely on chromosome-wide nucleotide disparities and are therefore limited to fully sequenced genomes, leaving a large number of genomic fragments unstudied. Here, we present gammaBOriS (Gammaproteobacterial oriC Searcher), which identifies oriC sequences on gammaproteobacterial chromosomal fragments. It does so by employing motif-based machine learning methods. Using gammaBOriS, we created BOriS DB, which currently contains 25,827 gammaproteobacterial oriC sequences from 1,217 species, thus making it the largest available database for oriC sequences to date. Furthermore, we present gammaBOriTax, a machine-learning based approach for taxonomic classification of oriC sequences, which was trained on the sequences in BOriS DB. Finally, we extracted the motifs relevant for identification and classification decisions of the models. Our results suggest that machine learning sequence classification approaches can offer great support in functional motif identification.


Asunto(s)
Gammaproteobacteria/clasificación , Gammaproteobacteria/genética , Aprendizaje Automático , Motivos de Nucleótidos/genética , Origen de Réplica/genética , Programas Informáticos , Secuencia de Bases , Secuencia de Consenso/genética , Modelos Genéticos , Filogenia
14.
PLoS One ; 15(4): e0229315, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32320410

RESUMEN

Mutations in the splicing machinery have been implicated in a number of human diseases. Most notably, the U2 small nuclear ribonucleoprotein (snRNP) component SF3b1 has been found to be frequently mutated in blood cancers such as myelodysplastic syndromes (MDS). SF3b1 is a highly conserved HEAT repeat (HR)-containing protein and most of these blood cancer mutations cluster in a hot spot located in HR4-8. Recently, a second mutational hotspot has been identified in SF3b1 located in HR9-12 and is associated with acute myeloid leukemias, bladder urothelial carcinomas, and uterine corpus endometrial carcinomas. The consequences of these mutations on SF3b1 functions during splicing have not yet been tested. We incorporated the corresponding mutations into the yeast homolog of SF3b1 and tested their impact on splicing. We find that all of these HR9-12 mutations can support splicing in yeast, and this suggests that none of them are loss of function alleles in humans. The Hsh155V502F mutation alters splicing of several pre-mRNA reporters containing weak branch sites as well as a genetic interaction with Prp2 and physical interactions with Prp5 and Prp3. The ability of a single allele of Hsh155 to perturb interactions with multiple factors functioning at different stages of the splicing reaction suggests that some SF3b1-mutant disease phenotypes may have a complex origin on the spliceosome.


Asunto(s)
Mutación/genética , Fosfoproteínas/genética , Precursores del ARN/genética , Factores de Empalme de ARN/genética , Empalme del ARN/genética , Secuencias Repetitivas de Aminoácido , Ribonucleoproteína Nuclear Pequeña U2/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Secuencia de Aminoácidos , Secuencia de Consenso/genética , Epistasis Genética , Humanos , Fosfoproteínas/química , Unión Proteica , Factores de Empalme de ARN/química , Ribonucleoproteína Nuclear Pequeña U2/química , Saccharomyces cerevisiae/crecimiento & desarrollo , Proteínas de Saccharomyces cerevisiae/química
15.
J Vis Exp ; (157)2020 03 11.
Artículo en Inglés | MEDLINE | ID: mdl-32225162

RESUMEN

Whole genome sequencing can be used to characterize and to trace viral outbreaks. Nanopore-based whole genome sequencing protocols have been described for several different viruses. These approaches utilize an overlapping amplicon-based approach which can be used to target a specific virus or group of genetically related viruses. In addition to confirmation of the virus presence, sequencing can be used for genomic epidemiology studies, to track viruses and unravel origins, reservoirs and modes of transmission. For such applications, it is crucial to understand possible effects of the error rate associated with the platform used. Routine application in clinical and public health settings require that this is documented with every important change in the protocol. Previously, a protocol for whole genome Usutu virus sequencing on the nanopore sequencing platform was validated (R9.4 flowcell) by direct comparison to Illumina sequencing. Here, we describe the method used to determine the required read coverage, using the comparison between the R10 flow cell and Illumina sequencing as an example.


Asunto(s)
Flavivirus/genética , Genoma Viral , Secuenciación de Nanoporos , Secuenciación Completa del Genoma , Secuencia de Consenso/genética , Cartilla de ADN/metabolismo , Análisis de Datos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Reacción en Cadena de la Polimerasa , Estándares de Referencia
16.
Nat Commun ; 11(1): 1663, 2020 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-32245964

RESUMEN

Massively parallel, quantitative measurements of biomolecular activity across sequence space can greatly expand our understanding of RNA sequence-function relationships. We report the development of an RNA-array assay to perform such measurements and its application to a model RNA: the core glmS ribozyme riboswitch, which performs a ligand-dependent self-cleavage reaction. We measure the cleavage rates for all possible single and double mutants of this ribozyme across a series of ligand concentrations, determining kcat and KM values for active variants. These systematic measurements suggest that evolutionary conservation in the consensus sequence is driven by maintenance of the cleavage rate. Analysis of double-mutant rates and associated mutational interactions produces a structural and functional mapping of the ribozyme sequence, revealing the catalytic consequences of specific tertiary interactions, and allowing us to infer structural rearrangements that permit certain sequence variants to maintain activity.


Asunto(s)
Proteínas Bacterianas/genética , Evolución Molecular , ARN Catalítico/genética , Riboswitch/genética , Proteínas Bacterianas/química , Proteínas Bacterianas/metabolismo , Secuencia de Consenso/genética , Cristalografía , Pruebas de Enzimas , Secuenciación de Nucleótidos de Alto Rendimiento , Ligandos , Mutación , Conformación de Ácido Nucleico , ARN Catalítico/química , ARN Catalítico/metabolismo , Análisis de Secuencia de ARN , Relación Estructura-Actividad
17.
Plant J ; 103(1): 32-52, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-31981259

RESUMEN

If two related plant species hybridize, their genomes may be combined and duplicated within a single nucleus, thereby forming an allotetraploid. How the emerging plant balances two co-evolved genomes is still a matter of ongoing research. Here, we focus on satellite DNA (satDNA), the fastest turn-over sequence class in eukaryotes, aiming to trace its emergence, amplification, and loss during plant speciation and allopolyploidization. As a model, we used Chenopodium quinoa Willd. (quinoa), an allopolyploid crop with 2n = 4x = 36 chromosomes. Quinoa originated by hybridization of an unknown female American Chenopodium diploid (AA genome) with an unknown male Old World diploid species (BB genome), dating back 3.3-6.3 million years. Applying short read clustering to quinoa (AABB), C. pallidicaule (AA), and C. suecicum (BB) whole genome shotgun sequences, we classified their repetitive fractions, and identified and characterized seven satDNA families, together with the 5S rDNA model repeat. We show unequal satDNA amplification (two families) and exclusive occurrence (four families) in the AA and BB diploids by read mapping as well as Southern, genomic, and fluorescent in situ hybridization. Whereas the satDNA distributions support C. suecicum as possible parental species, we were able to exclude C. pallidicaule as progenitor due to unique repeat profiles. Using quinoa long reads and scaffolds, we detected only limited evidence of intergenomic homogenization of satDNA after allopolyploidization, but were able to exclude dispersal of 5S rRNA genes between subgenomes. Our results exemplify the complex route of tandem repeat evolution through Chenopodium speciation and allopolyploidization, and may provide sequence targets for the identification of quinoa's progenitors.


Asunto(s)
Chenopodium quinoa/genética , ADN Satélite/genética , Genoma de Planta/genética , Tetraploidía , Cromosomas de las Plantas/genética , Secuencia de Consenso/genética , Hibridación Genética/genética , Retroelementos/genética , Alineación de Secuencia , Secuencias Repetidas en Tándem/genética
18.
Artículo en Inglés | MEDLINE | ID: mdl-30307874

RESUMEN

The de-novo genome assembly is a challenging computational problem for which several pipelines have been developed. The advent of long-read sequencing technology has resulted in a new set of algorithmic approaches for the assembly process. In this work, we identify that one of these new and fast long-read assembly techniques (using Minimap2 and Miniasm) can be modified for the short-read assembly process. This possibility motivated us to customize a long-read assembly approach for applications in a short-read assembly scenario. Here, we compare and contrast our proposed de-novo assembly pipeline (MiniSR) with three other recently developed programs for the assembly of bacterial and small eukaryotic genomes. We have documented two trade-offs: one between speed and accuracy and the other between contiguity and base-calling errors. Our proposed assembly pipeline shows a good balance in these trade-offs. The resulting pipeline is 6 and 2.2 times faster than the short-read assemblers Spades and SGA, respectively. MiniSR generates assemblies of superior N50 and NGA50 to SGA, although assemblies are less complete and accurate than those from Spades. A third tool, SOAPdenovo2, is as fast as our proposed pipeline but had poorer assembly quality.


Asunto(s)
Secuencia de Consenso/genética , Genómica/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Genoma Bacteriano/genética , Secuenciación de Nucleótidos de Alto Rendimiento
19.
Vet Microbiol ; 239: 108451, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-31767095

RESUMEN

The substantial genetic diversity exhibited by influenza A viruses of swine (IAV-S) represents the main challenge for the development of a broadly protective vaccine against this important pathogen. The consensus vaccine immunogen has proven an effective vaccinology approach to overcome the extraordinary genetic diversity of RNA viruses. In this project, we sought to determine if a consensus IAV-S hemagglutinin (HA) immunogen would elicit broadly protective immunity in pigs. To address this question, a consensus HA gene (designated H3-CON.1) was generated from a set of 1,112 H3 sequences of IAV-S recorded in GenBank from 2011 to 2015. The consensus HA gene and a HA gene of a naturally occurring H3N2 IAV-S strain (designated H3-TX98) were expressed using the baculovirus expression system and emulsified in an oil-in-water adjuvant to be used for vaccination. Pigs vaccinated with H3-CON.1 immunogen elicited broader levels of cross-reactive neutralizing antibodies and interferon gamma secreting cells than those vaccinated with H3-TX98 immunogen. After challenge infection with a fully infectious H3N2 IAV-S isolate, the H3-CON.1-vaccinated pigs shed significantly lower levels of virus in their nasal secretions than the H3-TX98-vaccinated pigs. Collectively, our data provide a proof-of-evidence that the consensus immunogen approach may be effectively employed to develop a broadly protective vaccine against IAV-S.


Asunto(s)
Genes Virales/inmunología , Glicoproteínas Hemaglutininas del Virus de la Influenza/inmunología , Vacunas contra la Influenza/inmunología , Infecciones por Orthomyxoviridae , Enfermedades de los Porcinos , Vacunación/veterinaria , Animales , Anticuerpos Antivirales/sangre , Secuencia de Consenso/genética , Secuencia de Consenso/inmunología , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Infecciones por Orthomyxoviridae/inmunología , Infecciones por Orthomyxoviridae/virología , Porcinos , Enfermedades de los Porcinos/inmunología , Enfermedades de los Porcinos/virología , Esparcimiento de Virus/inmunología
20.
Mol Cell Proteomics ; 18(12): 2348-2358, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31604803

RESUMEN

Low vaccine efficacy against seasonal influenza A virus (IAV) stems from the ability of the virus to evade existing immunity while maintaining fitness. Although most potent neutralizing antibodies bind antigenic sites on the globular head domain of the IAV envelope glycoprotein hemagglutinin (HA), the error-prone IAV polymerase enables rapid evolution of key antigenic sites, resulting in immune escape. Significantly, the appearance of new N-glycosylation consensus sequences (sequons, NXT/NXS, rarely NXC) on the HA globular domain occurs among the more prevalent mutations as an IAV strain undergoes antigenic drift. The appearance of new glycosylation shields underlying amino acid residues from antibody contact, tunes receptor specificity, and balances receptor avidity with virion escape, all of which help maintain viral propagation through seasonal mutations. The World Health Organization selects seasonal vaccine strains based on information from surveillance, laboratory, and clinical observations. Although the genetic sequences are known, mature glycosylated structures of circulating strains are not defined. In this review, we summarize mass spectrometric methods for quantifying site-specific glycosylation in IAV strains and compare the evolution of IAV glycosylation to that of human immunodeficiency virus. We argue that the determination of site-specific glycosylation of IAV glycoproteins would enable development of vaccines that take advantage of glycosylation-dependent mechanisms whereby virus glycoproteins are processed by antigen presenting cells.


Asunto(s)
Virus de la Influenza A/inmunología , Virus de la Influenza A/metabolismo , Vacunas contra la Influenza/inmunología , Animales , Secuencia de Consenso/genética , Glicosilación , Humanos , Virus de la Influenza A/genética , Espectrometría de Masas , Mutación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA