Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21.712
Filtrar
2.
Mol Biol Rep ; 51(1): 887, 2024 Aug 06.
Artigo em Inglês | MEDLINE | ID: mdl-39105821

RESUMO

BACKGROUND: The marine environment harbors high biodiversity; however, it is poorly understood. Nucleotide sequence data of all marine organisms should be accumulated before natural and/or anthropogenic environmental changes jeopardize the marine environment. In this study, we report a cost-effective and easy DNA barcoding method. This method can be readily adopted without using library preparation kits. It includes multiplex PCR of short targets, indexing PCR, and outsourcing to a sequencing service using the NovaSeq system. METHODS AND RESULTS: We targeted four mitochondrial genes [cytochrome c oxidase subunit I (COI), COIII, 16S rRNA (16S), and 12S rRNA (12S)] and three nuclear genes [18S rRNA (18S), 28S rRNA (28S), internal transcribed spacer 2 (ITS2)] in 95 marine invertebrate specimens, which were primarily annelids. The primers, including adapters and indices for NovaSeq sequencing, were newly designed. Two PCR runs were conducted. The 1st PCR amplified specific loci with universal primers and the 2nd added sequencing adapters and indices to the 1st PCR products. The gene sequences obtained from the FASTQ files were subjected to BLAST search and phylogenetic analyses. One run using 95 specimens yielded sequences averaging 2816 bp per specimen for a total length of six loci. Nuclear genes were more successfully assembled compared with mitochondrial genes. A weak but significantly negative correlation was observed between the average length of each locus and success rate of the assembly. Some of the sequences were almost identical to the sequences obtained from specimens collected far from Japan, indicating the presence of potentially invasive species identified for the first time. CONCLUSIONS: We obtained gene sequences efficiently using next-generation sequencing rather than Sanger sequencing. Although this method requires further optimization to increase the success rate for some loci, it is used as a first step to select specimens for further analyses by determining the specific loci of the targets.


Assuntos
Organismos Aquáticos , Código de Barras de DNA Taxonômico , Invertebrados , Filogenia , Animais , Código de Barras de DNA Taxonômico/métodos , Organismos Aquáticos/genética , Invertebrados/genética , Reação em Cadeia da Polimerase/métodos , Análise de Sequência de DNA/métodos , RNA Ribossômico 16S/genética , Complexo IV da Cadeia de Transporte de Elétrons/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Biodiversidade , Análise Custo-Benefício
5.
HLA ; 104(2): e15634, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39091246

RESUMO

Genomic sequence of HLA-DQB1*03:01:01:60, -DQB1*03:01:01:61, -DQB1*03:01:01:62, -DQB1*03:01:01:63, -DQB1*03:02:01:23, -DQB1*03:02:01:24, -DQB1*03:02:01:25 and -DQB1*03:03:02:14 alleles in Spanish individuals.


Assuntos
Alelos , Cadeias beta de HLA-DQ , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cadeias beta de HLA-DQ/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Teste de Histocompatibilidade/métodos , Éxons , Espanha , Análise de Sequência de DNA/métodos , Variação Genética
8.
HLA ; 104(2): e15633, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39091269

RESUMO

Two novel HLA-DQB1 alleles, HLA-DQB1*05:01:50 and HLA-DQB1*06:486, characterised in bone marrow volunteers.


Assuntos
Alelos , Éxons , Cadeias beta de HLA-DQ , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cadeias beta de HLA-DQ/genética , Teste de Histocompatibilidade/métodos , Sequência de Bases , Análise de Sequência de DNA/métodos , Códon , Medula Óssea
10.
Int J Mol Sci ; 25(15)2024 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-39125741

RESUMO

The Penicillium genus exhibits a broad global distribution and holds substantial economic value in sectors including agriculture, industry, and medicine. Particularly in agriculture, Penicillium species significantly impact plants, causing diseases and contamination that adversely affect crop yields and quality. Timely detection of Penicillium species is crucial for controlling disease and preventing mycotoxins from entering the food chain. To tackle this issue, we implement a novel species identification approach called Analysis of whole GEnome (AGE). Here, we initially applied bioinformatics analysis to construct specific target sequence libraries from the whole genomes of seven Penicillium species with significant economic impact: P. canescens, P. citrinum, P. oxalicum, P. polonicum, P. paneum, P. rubens, and P. roqueforti. We successfully identified seven Penicillium species using the target we screened combined with Sanger sequencing and CRISPR-Cas12a technologies. Notably, based on CRISPR-Cas12a technology, AGE can achieve rapid and accurate identification of genomic DNA samples at a concentration as low as 0.01 ng/µL within 30 min. This method features high sensitivity and portability, making it suitable for on-site detection. This robust molecular approach provides precise fungal species identification with broad implications for agricultural control, industrial production, clinical diagnostics, and food safety.


Assuntos
Genoma Fúngico , Penicillium , Penicillium/genética , Penicillium/classificação , Penicillium/isolamento & purificação , Sistemas CRISPR-Cas , Sequenciamento Completo do Genoma/métodos , Biologia Computacional/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/economia , Filogenia
11.
Int J Mol Sci ; 25(15)2024 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-39126071

RESUMO

With the widespread adoption of next-generation sequencing technologies, the speed and convenience of genome sequencing have significantly improved, and many biological genomes have been sequenced. However, during the assembly of small genomes, we still face a series of challenges, including repetitive fragments, inverted repeats, low sequencing coverage, and the limitations of sequencing technologies. These challenges lead to unknown gaps in small genomes, hindering complete genome assembly. Although there are many existing assembly software options, they do not fully utilize the potential of artificial intelligence technologies, resulting in limited improvement in gap filling. Here, we propose a novel method, DLGapCloser, based on deep learning, aimed at assisting traditional tools in further filling gaps in small genomes. Firstly, we created four datasets based on the original genomes of Saccharomyces cerevisiae, Schizosaccharomyces pombe, Neurospora crassa, and Micromonas pusilla. To further extract effective information from the gene sequences, we also added homologous genomes to enrich the datasets. Secondly, we proposed the DGCNet model, which effectively extracts features and learns context from sequences flanking gaps. Addressing issues with early pruning and high memory usage in the Beam Search algorithm, we developed a new prediction algorithm, Wave-Beam Search. This algorithm alternates between expansion and contraction phases, enhancing efficiency and accuracy. Experimental results showed that the Wave-Beam Search algorithm improved the gap-filling performance of assembly tools by 7.35%, 28.57%, 42.85%, and 8.33% on the original results. Finally, we established new gap-filling standards and created and implemented a novel evaluation method. Validation on the genomes of Saccharomyces cerevisiae, Schizosaccharomyces pombe, Neurospora crassa, and Micromonas pusilla showed that DLGapCloser increased the number of filled gaps by 8.05%, 15.3%, 1.4%, and 7% compared to traditional assembly tools.


Assuntos
Redes Neurais de Computação , Algoritmos , Aprendizado Profundo , Genoma Fúngico , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neurospora crassa/genética , Software , Genômica/métodos , Análise de Sequência de DNA/métodos
12.
Int J Mol Sci ; 25(15)2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39126101

RESUMO

Cystic fibrosis is caused by biallelic pathogenic variants in the CFTR gene, which contains a polymorphic (TG)mTn sequence (the "poly-T/TG tract") in intron 9. While T9 and T7 alleles are benign, T5 alleles with longer TG repeats, e.g., (TG)12T5 and (TG)13T5, are clinically significant. Thus, professional medical societies currently recommend reporting the TG repeat size when T5 is detected. Sanger sequencing is a cost-effective method of genotyping the (TG)mTn tract; however, its polymorphic length substantially complicates data analysis. We developed CFTR-TIPS, a freely available web-based software tool that infers the (TG)mTn genotype from Sanger sequencing data. This tool detects the (TG)mTn tract in the chromatograms, quantifies goodness of fit with expected patterns, and visualizes the results in a graphical user interface. It is broadly compatible with any Sanger chromatogram that contains the (TG)mTn tract ± 15 bp. We evaluated CFTR-TIPS using 835 clinical samples previously analyzed in a CLIA-certified, CAP-accredited laboratory. When operated fully automatically, CFTR-TIPS achieved 99.8% concordance with our clinically validated manual workflow, while generally taking less than 10 s per sample. There were two discordant samples: one due to a co-occurring heterozygous duplication that confounded the tool and the other due to incomplete (TG)mTn tract detection in the reverse chromatogram. No clinically significant misclassifications were observed. CFTR-TIPS is a free, accurate, and rapid tool for CFTR (TG)mTn tract genotyping using cost-effective Sanger sequencing. This tool is suitable both for automated use and as an aid to manual review to enhance accuracy and reduce analysis time.


Assuntos
Regulador de Condutância Transmembrana em Fibrose Cística , Fibrose Cística , Genótipo , Técnicas de Genotipagem , Software , Regulador de Condutância Transmembrana em Fibrose Cística/genética , Humanos , Fibrose Cística/genética , Técnicas de Genotipagem/métodos , Alelos , Análise de Sequência de DNA/métodos
13.
Bioinformatics ; 40(8)2024 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-39107889

RESUMO

MOTIVATION: Transcription factors are pivotal in the regulation of gene expression, and accurate identification of transcription factor binding sites (TFBSs) at high resolution is crucial for understanding the mechanisms underlying gene regulation. The task of identifying TFBSs from DNA sequences is a significant challenge in the field of computational biology today. To address this challenge, a variety of computational approaches have been developed. However, these methods face limitations in their ability to achieve high-resolution identification and often lack interpretability. RESULTS: We propose BertSNR, an interpretable deep learning framework for identifying TFBSs at single-nucleotide resolution. BertSNR integrates sequence-level and token-level information by multi-task learning based on pre-trained DNA language models. Benchmarking comparisons show that our BertSNR outperforms the existing state-of-the-art methods in TFBS predictions. Importantly, we enhanced the interpretability of the model through attentional weight visualization and motif analysis, and discovered the subtle relationship between attention weight and motif. Moreover, BertSNR effectively identifies TFBSs in promoter regions, facilitating the study of intricate gene regulation. AVAILABILITY AND IMPLEMENTATION: The BertSNR source code can be found at https://github.com/lhy0322/BertSNR.


Assuntos
Aprendizado Profundo , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Sítios de Ligação , Biologia Computacional/métodos , DNA/metabolismo , DNA/química , Análise de Sequência de DNA/métodos , Software , Algoritmos
14.
BMC Bioinformatics ; 25(1): 263, 2024 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-39118013

RESUMO

BACKGROUND: Genome assembly, which involves reconstructing a target genome, relies on scaffolding methods to organize and link partially assembled fragments. The rapid evolution of long read sequencing technologies toward more accurate long reads, coupled with the continued use of short read technologies, has created a unique need for hybrid assembly workflows. The construction of accurate genomic scaffolds in hybrid workflows is complicated due to scale, sequencing technology diversity (e.g., short vs. long reads, contigs or partial assemblies), and repetitive regions within a target genome. RESULTS: In this paper, we present a new parallel workflow for hybrid genome scaffolding that would allow combining pre-constructed partial assemblies with newly sequenced long reads toward an improved assembly. More specifically, the workflow, called Maptcha, is aimed at generating long scaffolds of a target genome, from two sets of input sequences-an already constructed partial assembly of contigs, and a set of newly sequenced long reads. Our scaffolding approach internally uses an alignment-free mapping step to build a ⟨ contig,contig ⟩ graph using long reads as linking information. Subsequently, this graph is used to generate scaffolds. We present and evaluate a graph-theoretic "wiring" heuristic to perform this scaffolding step. To enable efficient workload management in a parallel setting, we use a batching technique that partitions the scaffolding tasks so that the more expensive alignment-based assembly step at the end can be efficiently parallelized. This step also allows the use of any standalone assembler for generating the final scaffolds. CONCLUSIONS: Our experiments with Maptcha on a variety of input genomes, and comparison against two state-of-the-art hybrid scaffolders demonstrate that Maptcha is able to generate longer and more accurate scaffolds substantially faster. In almost all cases, the scaffolds produced by Maptcha are at least an order of magnitude longer (in some cases two orders) than the scaffolds produced by state-of-the-art tools. Maptcha runs significantly faster too, reducing time-to-solution from hours to minutes for most input cases. We also performed a coverage experiment by varying the sequencing coverage depth for long reads, which demonstrated the potential of Maptcha to generate significantly longer scaffolds in low coverage settings ( 1 × - 10 × ).


Assuntos
Genômica , Fluxo de Trabalho , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos
15.
Microbiome ; 12(1): 151, 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39143609

RESUMO

BACKGROUND: Metagenomic binning, the clustering of assembled contigs that belong to the same genome, is a crucial step for recovering metagenome-assembled genomes (MAGs). Contigs are linked by exploiting consistent signatures along a genome, such as read coverage patterns. Using coverage from multiple samples leads to higher-quality MAGs; however, standard pipelines require all-to-all read alignments for multiple samples to compute coverage, becoming a key computational bottleneck. RESULTS: We present fairy ( https://github.com/bluenote-1577/fairy ), an approximate coverage calculation method for metagenomic binning. Fairy is a fast k-mer-based alignment-free method. For multi-sample binning, fairy can be > 250 × faster than read alignment and accurate enough for binning. Fairy is compatible with several existing binners on host and non-host-associated datasets. Using MetaBAT2, fairy recovers 98.5 % of MAGs with > 50 % completeness and < 5 % contamination relative to alignment with BWA. Notably, multi-sample binning with fairy is always better than single-sample binning using BWA ( > 1.5 × more > 50 % complete MAGs on average) while still being faster. For a public sediment metagenome project, we demonstrate that multi-sample binning recovers higher quality Asgard archaea MAGs than single-sample binning and that fairy's results are indistinguishable from read alignment. CONCLUSIONS: Fairy is a new tool for approximately and quickly calculating multi-sample coverage for binning, resolving a computational bottleneck for metagenomics. Video Abstract.


Assuntos
Metagenoma , Metagenômica , Metagenômica/métodos , Software , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos , Archaea/genética , Archaea/classificação , Algoritmos
16.
PLoS Comput Biol ; 20(8): e1011854, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39093856

RESUMO

Single-cell ATAC-seq sequencing data (scATAC-seq) has been widely used to investigate chromatin accessibility on the single-cell level. One important application of scATAC-seq data analysis is differential chromatin accessibility (DA) analysis. However, the data characteristics of scATAC-seq such as excessive zeros and large variability of chromatin accessibility across cells impose a unique challenge for DA analysis. Existing statistical methods focus on detecting the mean difference of the chromatin accessible regions while overlooking the distribution difference. Motivated by real data exploration that distribution difference exists among cell types, we introduce a novel composite statistical test named "scaDA", which is based on zero-inflated negative binomial model (ZINB), for performing differential distribution analysis of chromatin accessibility by jointly testing the abundance, prevalence and dispersion simultaneously. Benefiting from both dispersion shrinkage and iterative refinement of mean and prevalence parameter estimates, scaDA demonstrates its superiority to both ZINB-based likelihood ratio tests and published methods by achieving the highest power and best FDR control in a comprehensive simulation study. In addition to demonstrating the highest power in three real sc-multiome data analyses, scaDA successfully identifies differentially accessible regions in microglia from sc-multiome data for an Alzheimer's disease (AD) study that are most enriched in GO terms related to neurogenesis and the clinical phenotype of AD, and AD-associated GWAS SNPs.


Assuntos
Cromatina , Análise de Célula Única , Cromatina/genética , Cromatina/metabolismo , Cromatina/química , Análise de Célula Única/métodos , Análise de Célula Única/estatística & dados numéricos , Humanos , Biologia Computacional/métodos , Doença de Alzheimer/genética , Modelos Estatísticos , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Simulação por Computador , Animais , Análise de Sequência de DNA/métodos , Algoritmos
17.
Sci Rep ; 14(1): 18650, 2024 08 12.
Artigo em Inglês | MEDLINE | ID: mdl-39134627

RESUMO

Exposure to ionizing radiation can induce genetic aberrations via unrepaired DNA strand breaks. To investigate quantitatively the dose-effect relationship at the molecular level, we irradiated dry pBR322 plasmid DNA with 3 MeV protons and assessed fragmentation yields at different radiation doses using long-read sequencing from Oxford Nanopore Technologies. This technology applied to a reference DNA model revealed dose-dependent fragmentation, as evidenced by read length distributions, showing no discernible radiation sensitivity in specific genetic sequences. In addition, we propose a method for directly measuring the single-strand break (SSB) yield. Furthermore, through a comparative study with a collection of previous works on dry DNA irradiation, we show that the irradiation protocol leads to biases in the definition of ionizing sources. We support this scenario by discussing the size distributions of nanopore sequencing reads in the light of Geant4 and Geant4-DNA simulation toolkit predictions. We show that integrating long-read sequencing technologies with advanced Monte Carlo simulations paves a promising path toward advancing our comprehension and prediction of radiation-induced DNA fragmentation.


Assuntos
Fragmentação do DNA , Método de Monte Carlo , Plasmídeos , Plasmídeos/genética , Fragmentação do DNA/efeitos da radiação , Relação Dose-Resposta à Radiação , Análise de Sequência de DNA/métodos , Quebras de DNA de Cadeia Simples/efeitos da radiação , DNA/genética
18.
Nat Commun ; 15(1): 6956, 2024 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-39138168

RESUMO

Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.


Assuntos
Diploide , Genoma Humano , Variação Estrutural do Genoma , Polimorfismo de Nucleotídeo Único , Humanos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Haplótipos
19.
HLA ; 104(2): e15654, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39149758

RESUMO

Full genomic sequence shows HLA-G*01:19 differs from HLA-G*01:04:01:01 only at position 99 in exon 2.


Assuntos
Alelos , Éxons , Antígenos HLA-G , Humanos , Antígenos HLA-G/genética , Teste de Histocompatibilidade , Sequência de Bases , Análise de Sequência de DNA/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...