Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 19.747
Filtrar
1.
Nat Biotechnol ; 39(9): 1129-1140, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34504351

RESUMO

Assessing the reproducibility, accuracy and utility of massively parallel DNA sequencing platforms remains an ongoing challenge. Here the Association of Biomolecular Resource Facilities (ABRF) Next-Generation Sequencing Study benchmarks the performance of a set of sequencing instruments (HiSeq/NovaSeq/paired-end 2 × 250-bp chemistry, Ion S5/Proton, PacBio circular consensus sequencing (CCS), Oxford Nanopore Technologies PromethION/MinION, BGISEQ-500/MGISEQ-2000 and GS111) on human and bacterial reference DNA samples. Among short-read instruments, HiSeq 4000 and X10 provided the most consistent, highest genome coverage, while BGI/MGISEQ provided the lowest sequencing error rates. The long-read instrument PacBio CCS had the highest reference-based mapping rate and lowest non-mapping rate. The two long-read platforms PacBio CCS and PromethION/MinION showed the best sequence mapping in repeat-rich areas and across homopolymers. NovaSeq 6000 using 2 × 250-bp read chemistry was the most robust instrument for capturing known insertion/deletion events. This study serves as a benchmark for current genomics technologies, as well as a resource to inform experimental design and next-generation sequencing variant calling.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Pareamento Incorreto de Bases , Benchmarking , DNA/genética , DNA Bacteriano/genética , Genoma Bacteriano , Genoma Humano , Humanos
2.
Sci Rep ; 11(1): 15869, 2021 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-34354202

RESUMO

Since December 2019, a novel coronavirus responsible for a severe acute respiratory syndrome (SARS-CoV-2) is accountable for a major pandemic situation. The emergence of the B.1.1.7 strain, as a highly transmissible variant has accelerated the world-wide interest in tracking SARS-CoV-2 variants' occurrence. Similarly, other extremely infectious variants, were described and further others are expected to be discovered due to the long period of time on which the pandemic situation is lasting. All described SARS-CoV-2 variants present several mutations within the gene encoding the Spike protein, involved in host receptor recognition and entry into the cell. Hence, instead of sequencing the whole viral genome for variants' tracking, herein we propose to focus on the SPIKE region to increase the number of candidate samples to screen at once; an essential aspect to accelerate diagnostics, but also variants' emergence/progression surveillance. This proof of concept study accomplishes both at once, population-scale diagnostics and variants' tracking. This strategy relies on (1) the use of the portable MinION DNA sequencer; (2) a DNA barcoding and a SPIKE gene-centered variant's tracking, increasing the number of candidates per assay; and (3) a real-time diagnostics and variant's tracking monitoring thanks to our software RETIVAD. This strategy represents an optimal solution for addressing the current needs on SARS-CoV-2 progression surveillance, notably due to its affordable implementation, allowing its implantation even in remote places over the world.


Assuntos
COVID-19/diagnóstico , SARS-CoV-2/genética , Análise de Sequência de DNA/métodos , COVID-19/virologia , Teste de Ácido Nucleico para COVID-19/instrumentação , Teste de Ácido Nucleico para COVID-19/métodos , Genoma Viral , Humanos , Nanoporos , RNA Viral/genética , Análise de Sequência de DNA/instrumentação , Glicoproteína da Espícula de Coronavírus/genética
3.
Viruses ; 13(7)2021 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-34372586

RESUMO

Hepatitis B (HBV) and delta (HDV) viruses are endemic in the Amazon region, but vaccine coverage against HBV is still limited. People who use illicit drugs (PWUDs) represent a high-risk group due to common risk behavior and socioeconomic factors that facilitate the acquisition and transmission of pathogens. The present study assessed the presence of HBV and HBV-HDV co-infection, identified viral sub-genotypes, and verified the occurrence of mutations in coding regions for HBsAg and part of the polymerase in HBV-infected PWUDs in municipalities of the Brazilian states of Amapá and Pará, in the Amazon region. In total, 1074 PWUDs provided blood samples and personal data in 30 municipalities of the Brazilian Amazon. HBV and HDV were detected by enzyme-linked immunosorbent assay and polymerase chain reaction. Viral genotypes were identified by nucleotide sequencing followed by phylogenetic analysis, whereas viral mutations were analyzed by specialized software. High rates of serological (32.2%) and molecular (7.2%) markers for HBV were detected, including cases of occult HBV infection (2.5%). Sub-genotypes A1, A2, D4, and F2a were most frequently found. Escape mutations due to vaccine and antiviral resistance were identified. Among PWUDs with HBV DNA, serological (19.5%) and molecular (11.7%) HDV markers were detected, such as HDV genotypes 1 and 3. These are worrying findings, presenting clear implications for urgent prevention and treatment needs for the carriers of these viruses.


Assuntos
Hepatite B/genética , Hepatite D/genética , Transtornos Relacionados ao Uso de Substâncias/virologia , Adulto , Brasil/epidemiologia , Coinfecção , Estudos Transversais , DNA Viral/genética , Usuários de Drogas , Ensaio de Imunoadsorção Enzimática/métodos , Feminino , Genótipo , Hepatite B/diagnóstico , Antígenos de Superfície da Hepatite B/análise , Antígenos de Superfície da Hepatite B/sangue , Vírus da Hepatite B/genética , Vírus da Hepatite B/patogenicidade , Hepatite D/diagnóstico , Vírus Delta da Hepatite/genética , Vírus Delta da Hepatite/patogenicidade , Humanos , Drogas Ilícitas , Masculino , Pessoa de Meia-Idade , Epidemiologia Molecular/métodos , Filogenia , RNA Viral/genética , Análise de Sequência de DNA/métodos
4.
Gene ; 803: 145892, 2021 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-34375633

RESUMO

The p53 tumor suppressor protein maintains the genome fidelity and integrity by modulating several cellular activities. It regulates these events by interacting with a heterogeneous set of response elements (REs) of regulatory genes in the background of chromatin configuration. At the p53-RE interface, both the base readout and torsional-flexibility of DNA account for high-affinity binding. However, DNA structure is an entanglement of a multitude of physicochemical features, both local and global structure should be considered for dealing with DNA-protein interactions. The goal of current research work is to conceptualize and abstract basic principles of p53-RE binding affinity as a function of structural alterations in DNA such as bending, twisting, and stretching flexibility and shape. For this purpose, we have exploited high throughput in-vitro relative affinity information of responsive elements and genome binding events of p53 from HT-Selex and ChIP-Seq experiments respectively. Our results confirm the role of torsional flexibility in p53 binding, and further, we reveal that DNA axial bending, stretching stiffness, propeller twist, and wedge angles are intimately linked to p53 binding affinity when compared to homeodomain, bZIP, and bHLH proteins. Besides, a similar DNA structural environment is observed in the distal sequences encompassing the actual binding sites of p53 cistrome genes. Additionally, we revealed that p53 cistrome target genes have unique promoter architecture, and the DNA flexibility of genomic sequences around REs in cancer and normal cell types display major differences. Altogether, our work provides a keynote on DNA structural features of REs that shape up the in-vitro and in-vivo high-affinity binding of the p53 transcription factor.


Assuntos
DNA/metabolismo , Análise de Sequência de DNA/métodos , Proteína Supressora de Tumor p53/metabolismo , Sítios de Ligação , Cromossomos Humanos/genética , DNA/química , Regulação da Expressão Gênica , Humanos , Regiões Promotoras Genéticas , Elementos de Resposta
5.
Gene ; 803: 145890, 2021 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-34375634

RESUMO

Escherichia coli Nissle 1917 (EcN) is an efficient probiotic strain extensively used worldwide because of its several health benefits. Adhesion to the intestinal cells is one of the prerequisites for a probiotic strain. To identify the genes essential for the adhesion of EcN on the intestinal cells, we utilized a quantitative genetic footprinting approach called transposon insertion sequencing (INSeq). A transposon insertion mutant library of EcN comprising of ~17,000 mutants was used to screen the adherence to the intestinal epithelial cells, Caco-2. The transposon insertion sites were identified from the input and output population by employing next-generation sequencing using the Ion torrent platform. Based on the relative abundance of reads in the input and output pools, we identified 113 candidate genes that are essential for the fitness of EcN during the adhesion and colonization on the Caco-2 cells. Functional categorization revealed that these fitness genes are associated with carbohydrate transport and metabolism, cell wall/membrane/envelope biogenesis, post-translational modification, stress response, motility and adhesion, and signal transduction. To further validate the genes identified in our INSeq analysis, we constructed individual knock-out mutants in five genes (cyclic di-GMP phosphodiesterase (gmp), hda, uidC, leuO, and hypothetical protein-coding gene). We investigated their ability to adhere to Caco-2 cells. Evaluation of these mutants showed reduced adhesion on Caco-2 cells, confirming their role in adhesion. Understanding the functions of these genes may provide novel insights into molecular regulation during colonization of probiotic bacteria to the intestinal cells, and useful to develop designer probiotic strains.


Assuntos
Proteínas de Escherichia coli/genética , Escherichia coli/fisiologia , Mutagênese Insercional , Análise de Sequência de DNA/métodos , Aderência Bacteriana , Células CACO-2 , Elementos de DNA Transponíveis , Escherichia coli/genética , Aptidão Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Probióticos
6.
PLoS One ; 16(8): e0244468, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34432798

RESUMO

The newly emerged and rapidly spreading SARS-CoV-2 causes coronavirus disease 2019 (COVID-19). To facilitate a deeper understanding of the viral biology we developed a capture sequencing methodology to generate SARS-CoV-2 genomic and transcriptome sequences from infected patients. We utilized an oligonucleotide probe-set representing the full-length genome to obtain both genomic and transcriptome (subgenomic open reading frames [ORFs]) sequences from 45 SARS-CoV-2 clinical samples with varying viral titers. For samples with higher viral loads (cycle threshold value under 33, based on the CDC qPCR assay) complete genomes were generated. Analysis of junction reads revealed regions of differential transcriptional activity among samples. Mixed allelic frequencies along the 20kb ORF1ab gene in one sample, suggested the presence of a defective viral RNA species subpopulation maintained in mixture with functional RNA in one sample. The associated workflow is straightforward, and hybridization-based capture offers an effective and scalable approach for sequencing SARS-CoV-2 from patient samples.


Assuntos
COVID-19/patologia , SARS-CoV-2/genética , Análise de Sequência de DNA/métodos , COVID-19/virologia , DNA Complementar/química , DNA Complementar/metabolismo , Frequência do Gene , Variação Genética , Genoma Viral , Humanos , Fases de Leitura Aberta/genética , RNA Viral/genética , RNA Viral/metabolismo , Reação em Cadeia da Polimerase em Tempo Real , SARS-CoV-2/isolamento & purificação , Carga Viral
7.
Viruses ; 13(7)2021 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-34372543

RESUMO

Human immunodeficiency virus type 2 (HIV-2) accumulates fewer mutations during replication than HIV type 1 (HIV-1). Advanced studies of HIV-2 mutagenesis, however, have historically been confounded by high background error rates in traditional next-generation sequencing techniques. In this study, we describe the adaptation of the previously described maximum-depth sequencing (MDS) technique to studies of both HIV-1 and HIV-2 for the ultra-accurate characterization of viral mutagenesis. We also present the development of a user-friendly Galaxy workflow for the bioinformatic analyses of sequencing data generated using the MDS technique, designed to improve replicability and accessibility to molecular virologists. This adapted MDS technique and analysis pipeline were validated by comparisons with previously published analyses of the frequency and spectra of mutations in HIV-1 and HIV-2 and is readily expandable to studies of viral mutation across the genomes of both viruses. Using this novel sequencing pipeline, we observed that the background error rate was reduced 100-fold over standard Illumina error rates, and 10-fold over traditional unique molecular identifier (UMI)-based sequencing. This technical advancement will allow for the exploration of novel and previously unrecognized sources of viral mutagenesis in both HIV-1 and HIV-2, which will expand our understanding of retroviral diversity and evolution.


Assuntos
HIV-1/genética , HIV-2/genética , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos , Análise Mutacional de DNA/métodos , Genoma Viral/genética , Células HEK293 , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Mutação/genética , Fluxo de Trabalho
8.
Nat Commun ; 12(1): 4897, 2021 08 12.
Artigo em Inglês | MEDLINE | ID: mdl-34385432

RESUMO

Precise control of mammalian gene expression is facilitated through epigenetic mechanisms and nuclear organization. In particular, insulated chromosome structures are important for regulatory control, but the phenotypic consequences of their boundary disruption on developmental processes are complex and remain insufficiently understood. Here, we generated deeply sequenced Hi-C data for human pluripotent stem cells (hPSCs) that allowed us to identify CTCF loop domains that have highly conserved boundary CTCF sites and show a notable enrichment of individual developmental regulators. Importantly, perturbation of such a boundary in hPSCs interfered with proper differentiation through deregulated distal enhancer-promoter activity. Finally, we found that germline variations affecting such boundaries are subject to purifying selection and are underrepresented in the human population. Taken together, our findings highlight the importance of developmental gene isolation through chromosomal folding structures as a mechanism to ensure their proper expression.


Assuntos
Diferenciação Celular/genética , Perfilação da Expressão Gênica/métodos , Genoma Humano/genética , Células-Tronco Embrionárias Humanas/metabolismo , Células-Tronco Pluripotentes Induzidas/metabolismo , Elementos Reguladores de Transcrição/genética , Sítios de Ligação/genética , Western Blotting , Fator de Ligação a CCCTC/genética , Fator de Ligação a CCCTC/metabolismo , Linhagem Celular , Elementos Facilitadores Genéticos/genética , Células-Tronco Embrionárias Humanas/citologia , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Regiões Promotoras Genéticas/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Análise de Sequência de DNA/métodos
9.
Gene ; 804: 145871, 2021 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-34363887

RESUMO

Chrysotila dentata is an ecologically important marine alga contributing to the coccolith formation. In this study, a complete chloroplast (cp DNA) genome of Chrysotila dentata was sequenced by using Illumina Hiseq and was analyzed with the help of a bioinformatics tool CPGAVAS2. The circular chloroplast genome of Chrysotila dentata has a size of 109,017 bp with two inverted repeats (IRs) regions (4513 bp each) which is a common feature in most land plants and algal species. The Chrysotila dentata cp genome consists of 61 identified protein-coding genes, 30 tRNA genes and 6 rRNAs with 21 microsatellites. The phylogenetic relationship with other select algal species revealed a close phylogeny of Chrysotila dentata with Phaeocystis antarctica. This is the first report of the cp genome analysis of genus Chrysotila and the results from this study will be helpful for understanding the genetic structure and function of chloroplast in other species of Chrysotila.


Assuntos
Cloroplastos/genética , Haptófitas/genética , Biologia Computacional/métodos , Evolução Molecular , Genes de Plantas , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequências Repetidas Invertidas/genética , Repetições de Microssatélites/genética , Filogenia , RNA Ribossômico/genética , Análise de Sequência de DNA/métodos , Sequenciamento Completo do Genoma/métodos
10.
Genomics ; 113(5): 3174-3184, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34293476

RESUMO

As mutations in SARS-CoV-2 virus accumulate rapidly, novel primers that amplify this virus sensitively and specifically are in demand. We have developed a webserver named CoVrimer by which users can search for and align existing or newly designed conserved/degenerate primer pair sequences against the viral genome and assess the mutation load of both primers and amplicons. CoVrimer uses mutation data obtained from an online platform established by NGDC-CNCB (12 May 2021) to identify genomic regions, either conserved or with low levels of mutations, from which potential primer pairs are designed and provided to the user for filtering based on generalized and SARS-CoV-2 specific parameters. Alignments of primers and probes can be visualized with respect to the reference genome, indicating variant details and the level of conservation. Consequently, CoVrimer is likely to help researchers with the challenges posed by viral evolution and is freely available at http://konulabapps.bilkent.edu.tr:3838/CoVrimer/.


Assuntos
Primers do DNA/química , SARS-CoV-2/genética , Análise de Sequência de DNA/métodos , Software , Sequência Conservada , Primers do DNA/genética , Genoma Viral , Mutação
11.
Biosensors (Basel) ; 11(7)2021 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-34208844

RESUMO

In recent years, nanopore technology has become increasingly important in the field of life science and biomedical research. By embedding a nano-scale hole in a thin membrane and measuring the electrochemical signal, nanopore technology can be used to investigate the nucleic acids and other biomacromolecules. One of the most successful applications of nanopore technology, the Oxford Nanopore Technology, marks the beginning of the fourth generation of gene sequencing technology. In this review, the operational principle and the technology for signal processing of the nanopore gene sequencing are documented. Moreover, this review focuses on the applications using nanopore gene sequencing technology, including the diagnosis of cancer, detection of viruses and other microbes, and the assembly of genomes. These applications show that nanopore technology is promising in the field of biological and biomedical sensing.


Assuntos
Nanoporos , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Tecnologia , Vírus
12.
Int J Mol Sci ; 22(14)2021 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-34299288

RESUMO

(1) Background: Short-read sequencing allows for the rapid and accurate analysis of the whole bacterial genome but does not usually enable complete genome assembly. Long-read sequencing greatly assists with the resolution of complex bacterial genomes, particularly when combined with short-read Illumina data. However, it is not clear how different assembly strategies affect genomic accuracy, completeness, and protein prediction. (2) Methods: we compare different assembly strategies for Haemophilus parasuis, which causes Glässer's disease, characterized by fibrinous polyserositis and arthritis, in swine by using Illumina sequencing and long reads from the sequencing platforms of either Oxford Nanopore Technologies (ONT) or SMRT Pacific Biosciences (PacBio). (3) Results: Assembly with either PacBio or ONT reads, followed by polishing with Illumina reads, facilitated high-quality genome reconstruction and was superior to the long-read-only assembly and hybrid-assembly strategies when evaluated in terms of accuracy and completeness. An equally excellent method was correction with Homopolish after the ONT-only assembly, which had the advantage of avoiding hybrid sequencing with Illumina. Furthermore, by aligning transcripts to assembled genomes and their predicted CDSs, the sequencing errors of the ONT assembly were mainly indels that were generated when homopolymer regions were sequenced, thus critically affecting protein prediction. Polishing can fill indels and correct mistakes. (4) Conclusions: The assembly of bacterial genomes can be directly achieved by using long-read sequencing techniques. To maximize assembly accuracy, it is essential to polish the assembly with homologous sequences of related genomes or sequencing data from short-read technology.


Assuntos
Haemophilus parasuis/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento por Nanoporos/métodos , Análise de Sequência de DNA/métodos , Animais , Genoma Bacteriano , Haemophilus parasuis/isolamento & purificação , Filogenia , Alinhamento de Sequência , Suínos
13.
Mol Genet Genomics ; 296(5): 1147-1159, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34251529

RESUMO

This study aimed to identify quantitative trait loci (QTLs) for growth-related traits by constructing a genetic linkage map based on single nucleotide polymorphism (SNP) markers in Japanese quail. A QTL mapping population of 277 F2 birds was obtained from an intercross between a male of a large-sized strain and three females of a normal-sized strain. Body weight (BW) was measured weekly from hatching to 16 weeks of age. Non-linear regression growth models of Weibull, Logistic, Gompertz, Richards, and Brody were analyzed, and growth curve parameters of Richards was selected as the best model to describe the quail growth curve of the F2 birds. Restriction-site associated DNA sequencing developed 125 SNP markers that were informative between their parental strains. The SNP markers were distributed on 16 linkage groups that spanned 795.9 centiMorgan (cM) with an average marker interval of 7.3 cM. QTL analysis of phenotypic traits revealed four main-effect QTLs. Detected QTLs were located on chromosomes 1 and 3 and were associated with BW from 4 to 16 weeks of age and asymptotic weight of Richards model at genome-wide significant at 1% or 5% level. No QTL was detected for BW from 0 to 3 weeks of age. This is the first report identified QTLs for asymptotic weight of the Richards parameter in Japanese quail. These results highlight that the combination of QTL studies and the RAD-seq method will aid future breeding programs identify genes underlying the QTL and the application of marker-assisted selection in the poultry industry, particularly the Japanese quail.


Assuntos
Peso Corporal/genética , Coturnix/crescimento & desenvolvimento , Coturnix/genética , Locos de Características Quantitativas , Animais , Mapeamento Cromossômico , Feminino , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos
14.
Methods Mol Biol ; 2328: 153-170, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34251624

RESUMO

Single-cell RNAseq is an emerging technology that allows the quantification of gene expression in individual cells. In plants, single-cell sequencing technology has been applied to generate root cell expression maps under many experimental conditions. DAP-seq and ATAC-seq have also been used to generate genome-scale maps of protein-DNA interactions and open chromatin regions in plants. In this protocol, we describe a multistep computational pipeline for the integration of single-cell RNAseq data with DAP-seq and ATAC-seq data to predict regulatory networks and key regulatory genes. Our approach utilizes machine learning methods including feature selection and stability selection to identify candidate regulatory genes. The network generated by this pipeline can be used to provide a putative annotation of gene regulatory modules and to identify candidate transcription factors that could play a key role in specific cell types.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação/métodos , Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Aprendizado de Máquina , RNA-Seq/métodos , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Cromatina/metabolismo , Linguagens de Programação , Software
15.
Commun Biol ; 4(1): 851, 2021 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-34239036

RESUMO

Water scarcity and salinity are major challenges facing agriculture today, which can be addressed by engineering plants to grow in the boundless seawater. Understanding the mangrove plants at the molecular level will be necessary for developing such highly salt-tolerant agricultural crops. With this objective, we sequenced the genome of a salt-secreting and extraordinarily salt-tolerant mangrove species, Avicennia marina, that grows optimally in 75% seawater and tolerates >250% seawater. Our reference-grade ~457 Mb genome contains 31 scaffolds corresponding to its chromosomes. We identified 31,477 protein-coding genes and a salinome consisting of 3246 salinity-responsive genes and homologs of 614 experimentally validated salinity tolerance genes. The salinome provides a strong foundation to understand the molecular mechanisms of salinity tolerance in plants and breeding crops suitable for seawater farming.


Assuntos
Avicennia/genética , Genoma de Planta/genética , Tolerância ao Sal/genética , Sais/metabolismo , Agricultura/métodos , Avicennia/metabolismo , DNA de Plantas/química , DNA de Plantas/genética , Perfilação da Expressão Gênica/métodos , Regulação da Expressão Gênica de Plantas , Tamanho do Genoma/genética , Genômica/métodos , RNA-Seq/métodos , Salinidade , Água do Mar , Análise de Sequência de DNA/métodos
16.
Sci Rep ; 11(1): 14558, 2021 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-34267263

RESUMO

Whereas accelerated attention beclouded early stages of the coronavirus spread, knowledge of actual pathogenicity and origin of possible sub-strains remained unclear. By harvesting the Global initiative on Sharing All Influenza Data (GISAID) database ( https://www.gisaid.org/ ), between December 2019 and January 15, 2021, a total of 8864 human SARS-CoV-2 complete genome sequences processed by gender, across 6 continents (88 countries) of the world, Antarctica exempt, were analyzed. We hypothesized that data speak for itself and can discern true and explainable patterns of the disease. Identical genome diversity and pattern correlates analysis performed using a hybrid of biotechnology and machine learning methods corroborate the emergence of inter- and intra- SARS-CoV-2 sub-strains transmission and sustain an increase in sub-strains within the various continents, with nucleotide mutations dynamically varying between individuals in close association with the virus as it adapts to its host/environment. Interestingly, some viral sub-strain patterns progressively transformed into new sub-strain clusters indicating varying amino acid, and strong nucleotide association derived from same lineage. A novel cognitive approach to knowledge mining helped the discovery of transmission routes and seamless contact tracing protocol. Our classification results were better than state-of-the-art methods, indicating a more robust system for predicting emerging or new viral sub-strain(s). The results therefore offer explanations for the growing concerns about the virus and its next wave(s). A future direction of this work is a defuzzification of confusable pattern clusters for precise intra-country SARS-CoV-2 sub-strains analytics.


Assuntos
COVID-19/virologia , SARS-CoV-2/genética , Análise de Sequência de DNA/métodos , COVID-19/epidemiologia , COVID-19/transmissão , Biologia Computacional/métodos , DNA Viral/genética , Bases de Dados Genéticas , Previsões/métodos , Genoma Viral , Humanos , Aprendizado de Máquina , Mutação , Filogenia , SARS-CoV-2/classificação , SARS-CoV-2/patogenicidade , Sequenciamento Completo do Genoma/métodos
17.
Nat Commun ; 12(1): 4387, 2021 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-34282137

RESUMO

Targeted high-throughput DNA sequencing is a primary approach for genomics and molecular diagnostics, and more recently as a readout for DNA information storage. Oligonucleotide probes used to enrich gene loci of interest have different hybridization kinetics, resulting in non-uniform coverage that increases sequencing costs and decreases sequencing sensitivities. Here, we present a deep learning model (DLM) for predicting Next-Generation Sequencing (NGS) depth from DNA probe sequences. Our DLM includes a bidirectional recurrent neural network that takes as input both DNA nucleotide identities as well as the calculated probability of the nucleotide being unpaired. We apply our DLM to three different NGS panels: a 39,145-plex panel for human single nucleotide polymorphisms (SNP), a 2000-plex panel for human long non-coding RNA (lncRNA), and a 7373-plex panel targeting non-human sequences for DNA information storage. In cross-validation, our DLM predicts sequencing depth to within a factor of 3 with 93% accuracy for the SNP panel, and 99% accuracy for the non-human panel. In independent testing, the DLM predicts the lncRNA panel with 89% accuracy when trained on the SNP panel. The same model is also effective at predicting the measured single-plex kinetic rate constants of DNA hybridization and strand displacement.


Assuntos
Sequência de Bases , Aprendizado Profundo , Sequenciamento de Nucleotídeos em Larga Escala/métodos , DNA/genética , Sondas de DNA , Genômica , Humanos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos
18.
Artigo em Inglês | MEDLINE | ID: mdl-34283014

RESUMO

A Gram-stain-positive, yellow-pigmented, non-motile actinobacterial strain, designated as BIT-GX5T, was isolated from a sesame husks compost collected in Beijing, PR China. This bacterium was found to be able to grow in the temperature range from 16 to 50 °C and had an optimal growth temperature at 45 °C. Its taxonomic position was analysed using a polyphasic approach. The 16S rRNA gene sequence (1482 bp) of strain BIT-GX5T was most similar to Cellulosimicrobium funkei ATCC BAA-886T (99.45%), Cellulosimicrobium cellulans LMG 16121T (99.17%) and Cellulosimicrobium marinum RS-7-4T (98.75%). The results of phylogenetic analyses, based on the 16S rRNA gene, concatenated sequences of five housekeeping genes (gyrB, rpoB, recA, atpD and trpB) and genome sequences, placed strain BIT-GX5T in a separate lineage among the genus Cellulosimicrobium within the family Promicromonosporaceae. The major polar lipids of strain BIT-GX5T were diphosphatidylglycerol, phosphatidylethanolamine, phosphatidylglycerol, aminophospholipid and aminolipid. The major isoprenoid quinone was MK-9(H4), while the cell-wall sugars were galactose, rhamnose, glucose and mannose. The peptidoglycan type was A4α l-Lys-d-Ser-d-Asp. The major fatty acids were anteiso-C15:0 and iso-C15: 0, which were similar to other members in the genus Cellulosimicrobium. Results of in silico DNA-DNA hybridization and average nucleotide identity calculations plus physiological and biochemical tests exhibited the genotypic and phenotypic differentiation of strain BIT-GX5T from the other members of the genus Cellulosimicrobium. Therefore, strain BIT-GX5T is considered to represent a novel species within the genus Cellulosimicrobium, for which the name Cellulosimicrobium composti sp. nov. is proposed. The type strain is BIT-GX5T (= CGMCC 1.17687T = KCTC 49391T).


Assuntos
Actinobacteria/isolamento & purificação , Compostagem/métodos , Actinobacteria/genética , Técnicas de Tipagem Bacteriana/métodos , DNA Bacteriano/genética , Filogenia , RNA Ribossômico 16S/genética , Análise de Sequência de DNA/métodos
19.
Artigo em Inglês | MEDLINE | ID: mdl-34283016

RESUMO

Two halophilic archaeal strains, Gai3-2T and NJ-3-1T, were isolated from salt lake and saline soil samples, respectively, collected in PR China. The 16S rRNA gene sequences of the two strains were 97.5% similar to each other. Strains Gai3-2T and NJ-3-1T had the highest sequence similarities to 'Halobonum tyrrellense' G22 (96.7 and 97.8%, respectively), and displayed similarities of 91.5-93.5% and 92.3-94.7%, respectively, to Halobaculum members. Phylogenetic analysis revealed that the two strains formed different branches and clustered tightly with 'H. tyrrellense' G22 and Halobaculum members. The average nucleotide identity (ANI), in silico DNA-DNA hybridization (isDDH) and amino acid identity (AAI) values between the two strains were 83.1, 26.9 and 77.9%, respectively, much lower than the threshold values proposed as a species boundary. These values between the two strains and 'H. tyrrellense' G22 (ANI 77.9-78.2%, isDDH 22.5-22.6% and AAI 68.8-69.3%) and Halobaculum members (ANI 77.53-77.63%, isDDH 21.8-22.3% and AAI 68.4-69.4%) were almost identical, and much lower than the recommended threshold values for species delimitation. These results suggested that strains Gai3-2T and NJ-3-1T represent two novel species of Halobaculum. Based on phenotypic, chemotaxonomic and phylogenetic properties, strains Gai3-2T (=CGMCC 1.16080T=JCM 33550T) and NJ-3-1T (=CGMCC 1.16040T=JCM 33552T) represent two novel species of the genus Halobaculum, for which the name Halobaculum halophilum sp. nov. and Halobaculum salinum sp. nov. are proposed.


Assuntos
DNA Arqueal/isolamento & purificação , Halobacteriaceae/isolamento & purificação , Lagos/análise , Extratos Vegetais/isolamento & purificação , Solo/química , DNA Arqueal/genética , Halobacteriaceae/genética , Filogenia , Extratos Vegetais/genética , Análise de Sequência de DNA/métodos
20.
Int J Mol Sci ; 22(13)2021 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-34209912

RESUMO

The molecular basis of orchid flower development is accomplished through a specific regulatory program in which the class B MADS-box AP3/DEF genes play a central role. In particular, the differential expression of four class B AP3/DEF genes is responsible for specification of organ identities in the orchid perianth. Other MADS-box genes (AGL6 and SEP-like) enrich the molecular program underpinning the orchid perianth development, resulting in the expansion of the original "orchid code" in an even more complex gene regulatory network. To identify candidates that could interact with the AP3/DEF genes in orchids, we conducted an in silico differential expression analysis in wild-type and peloric Phalaenopsis. The results suggest that a YABBY DL-like gene could be involved in the molecular program leading to the development of the orchid perianth, particularly the labellum. Two YABBY DL/CRC homologs are present in the genome of Phalaenopsis equestris, PeDL1 and PeDL2, and both express two alternative isoforms. Quantitative real-time PCR analyses revealed that both genes are expressed in column and ovary. In addition, PeDL2 is more strongly expressed the labellum than in the other tepals of wild-type flowers. This pattern is similar to that of the AP3/DEF genes PeMADS3/4 and opposite to that of PeMADS2/5. In peloric mutant Phalaenopsis, where labellum-like structures substitute the lateral inner tepals, PeDL2 is expressed at similar levels of the PeMADS2-5 genes, suggesting the involvement of PeDL2 in the development of the labellum, together with the PeMADS2-PeMADS5 genes. Although the yeast two-hybrid analysis did not reveal the ability of PeDL2 to bind the PeMADS2-PeMADS5 proteins directly, the existence of regulatory interactions is suggested by the presence of CArG-boxes and other MADS-box transcription factor binding sites within the putative promoter of the orchid DL2 gene.


Assuntos
Perfilação da Expressão Gênica/métodos , Proteínas de Domínio MADS/genética , Orchidaceae/fisiologia , Análise de Sequência de DNA/métodos , Evolução Molecular , Flores/genética , Flores/fisiologia , Regulação da Expressão Gênica de Plantas , Proteínas de Domínio MADS/metabolismo , Orchidaceae/genética , Filogenia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Regiões Promotoras Genéticas , Distribuição Tecidual
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...