Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21.413
Filtrar
3.
BMC Bioinformatics ; 25(1): 180, 2024 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-38720249

RESUMEN

BACKGROUND: High-throughput sequencing (HTS) has become the gold standard approach for variant analysis in cancer research. However, somatic variants may occur at low fractions due to contamination from normal cells or tumor heterogeneity; this poses a significant challenge for standard HTS analysis pipelines. The problem is exacerbated in scenarios with minimal tumor DNA, such as circulating tumor DNA in plasma. Assessing sensitivity and detection of HTS approaches in such cases is paramount, but time-consuming and expensive: specialized experimental protocols and a sufficient quantity of samples are required for processing and analysis. To overcome these limitations, we propose a new computational approach specifically designed for the generation of artificial datasets suitable for this task, simulating ultra-deep targeted sequencing data with low-fraction variants and demonstrating their effectiveness in benchmarking low-fraction variant calling. RESULTS: Our approach enables the generation of artificial raw reads that mimic real data without relying on pre-existing data by using NEAT, a fine-grained read simulator that generates artificial datasets using models learned from multiple different datasets. Then, it incorporates low-fraction variants to simulate somatic mutations in samples with minimal tumor DNA content. To prove the suitability of the created artificial datasets for low-fraction variant calling benchmarking, we used them as ground truth to evaluate the performance of widely-used variant calling algorithms: they allowed us to define tuned parameter values of major variant callers, considerably improving their detection of very low-fraction variants. CONCLUSIONS: Our findings highlight both the pivotal role of our approach in creating adequate artificial datasets with low tumor fraction, facilitating rapid prototyping and benchmarking of algorithms for such dataset type, as well as the important need of advancing low-fraction variant calling techniques.


Asunto(s)
Benchmarking , Secuenciación de Nucleótidos de Alto Rendimiento , Neoplasias , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Neoplasias/genética , Mutación , Algoritmos , ADN de Neoplasias/genética , Análisis de Secuencia de ADN/métodos , Biología Computacional/métodos
6.
J Phys Chem Lett ; 15(19): 5120-5129, 2024 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-38709198

RESUMEN

In the past few decades, nanometer-scale pores have been employed as powerful tools for sensing biological molecules. Owing to its unique structure and properties, solid-state nanopores provide interesting opportunities for the development of DNA sequencing technology. Controlling DNA translocation in nanopores is an important means of improving the accuracy of sequencing. Here we present a proof of principle study of accelerating DNA captured across targeted graphene nanopores using surface charge density and find the intrinsic mechanism of the combination of electroosmotic flow induced by charges of nanopore and electrostatic attraction/repulsion between the nanopore and ssDNA. The theoretical study performed here provides a new means for controlling DNA transport dynamics and makes better and cheaper application of graphene in molecular sequencing.


Asunto(s)
ADN , Grafito , Nanoporos , Electricidad Estática , Grafito/química , ADN/química , ADN de Cadena Simple/química , Electroósmosis , Análisis de Secuencia de ADN/métodos
7.
Int J Mol Sci ; 25(9)2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38732187

RESUMEN

Dynamic changes in genomic DNA methylation patterns govern the epigenetic developmental programs and accompany the organism's aging. Epigenetic clock (eAge) algorithms utilize DNA methylation to estimate the age and risk factors for diseases as well as analyze the impact of various interventions. High-throughput bisulfite sequencing methods, such as reduced-representation bisulfite sequencing (RRBS) or whole genome bisulfite sequencing (WGBS), provide an opportunity to identify the genomic regions of disordered or heterogeneous DNA methylation, which might be associated with cell-type heterogeneity, DNA methylation erosion, and allele-specific methylation. We systematically evaluated the applicability of five scores assessing the variability of methylation patterns by evaluating within-sample heterogeneity (WSH) to construct human blood epigenetic clock models using RRBS data. The best performance was demonstrated by the model based on a metric designed to assess DNA methylation erosion with an MAE of 3.686 years. We also trained a prediction model that uses the average methylation level over genomic regions. Although this region-based model was relatively more efficient than the WSH-based model, the latter required the analysis of just a few short genomic regions and, therefore, could be a useful tool to design a reduced epigenetic clock that is analyzed by targeted next-generation sequencing.


Asunto(s)
Envejecimiento , Metilación de ADN , Epigénesis Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Envejecimiento/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Algoritmos , Islas de CpG , Femenino , Masculino , Epigenómica/métodos , Anciano , Adulto , Persona de Mediana Edad , Análisis de Secuencia de ADN/métodos
9.
Sci Rep ; 14(1): 10826, 2024 05 11.
Artículo en Inglés | MEDLINE | ID: mdl-38734799

RESUMEN

Sequencing the DNA nucleobases is essential in the diagnosis and treatment of many diseases related to human genes. In this article, the encapsulation of DNA nucleobases with some of the important synthesized chiral (7, 6), (8, 6), and (10, 8) carbon nanotubes were investigated. The structures were modeled by applying density functional theory based on tight binding method (DFTB) by considering semi-empirical basis sets. Encapsulating DNA nucleobases on the inside of CNTs caused changes in the electronic properties of the selected chiral CNTs. The results confirmed that van der Waals (vdW) interactions, π-orbitals interactions, non-bonded electron pairs, and the presence of high electronegative atoms are the key factors for these changes. The result of electronic parameters showed that among the CNTs, CNT (8, 6) is a suitable choice in sequencing guanine (G) and cytosine (C) DNA nucleobases. However, they are not able to sequence adenine (A) and thymine (T). According to the band gap energy engineering approach and absorption energy, the presence of G and C DNA nucleobases decreased the band gap energy of CNTs. Hence selected CNTs suggested as biosensor substrates for sequencing G and C DNA nucleobases.


Asunto(s)
ADN , Guanina , Nanotubos de Carbono , Nanotubos de Carbono/química , ADN/química , Guanina/química , Teoría Funcional de la Densidad , Adenina/química , Citosina/química , Timina/química , Análisis de Secuencia de ADN/métodos , Electrones , Modelos Moleculares , Humanos
10.
Mol Biol Rep ; 51(1): 639, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38727924

RESUMEN

BACKGROUND: Peucedani Radix, also known as "Qian-hu" is a traditional Chinese medicine derived from Peucedanum praeruptorum Dunn. It is widely utilized for treating wind-heat colds and coughs accompanied by excessive phlegm. However, due to morphological similarities, limited resources, and heightened market demand, numerous substitutes and adulterants of Peucedani Radix have emerged within the herbal medicine market. Moreover, Peucedani Radix is typically dried and sliced for sale, rendering traditional identification methods challenging. MATERIALS AND METHODS: We initially examined and compared 104 commercial "Qian-hu" samples from various Chinese medicinal markets and 44 species representing genuine, adulterants or substitutes, utilizing the mini barcode ITS2 region to elucidate the botanical origins of the commercial "Qian-hu". The nucleotide signature specific to Peucedani Radix was subsequently developed by analyzing the polymorphic sites within the aligned ITS2 sequences. RESULTS: The results demonstrated a success rate of 100% and 93.3% for DNA extraction and PCR amplification, respectively. Forty-five samples were authentic "Qian-hu", while the remaining samples were all adulterants, originating from nine distinct species. Peucedani Radix, its substitutes, and adulterants were successfully identified based on the neighbor-joining tree. The 24-bp nucleotide signature (5'-ATTGTCGTACGAATCCTCGTCGTC-3') revealed distinct differences between Peucedani Radix and its common substitutes and adulterants. The newly designed specific primers (PR-F/PR-R) can amplify the nucleotide signature region from commercial samples and processed materials with severe DNA degradation. CONCLUSIONS: We advocate for the utilization of ITS2 and nucleotide signature for the rapid and precise identification of herbal medicines and their adulterants to regulate the Chinese herbal medicine industry.


Asunto(s)
Código de Barras del ADN Taxonómico , ADN de Plantas , ADN de Plantas/genética , Código de Barras del ADN Taxonómico/métodos , Medicamentos Herbarios Chinos/normas , Apiaceae/genética , Apiaceae/clasificación , Medicina Tradicional China/normas , ADN Espaciador Ribosómico/genética , Contaminación de Medicamentos , Plantas Medicinales/genética , Filogenia , Análisis de Secuencia de ADN/métodos , Reacción en Cadena de la Polimerasa/métodos , Nucleótidos/genética , Nucleótidos/análisis
11.
BMC Microbiol ; 24(1): 162, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38730339

RESUMEN

BACKGROUND: Coastal areas are subject to various anthropogenic and natural influences. In this study, we investigated and compared the characteristics of two coastal regions, Andhra Pradesh (AP) and Goa (GA), focusing on pollution, anthropogenic activities, and recreational impacts. We explored three main factors influencing the differences between these coastlines: The Bay of Bengal's shallower depth and lower salinity; upwelling phenomena due to the thermocline in the Arabian Sea; and high tides that can cause strong currents that transport pollutants and debris. RESULTS: The microbial diversity in GA was significantly higher than that in AP, which might be attributed to differences in temperature, soil type, and vegetation cover. 16S rRNA amplicon sequencing and bioinformatics analysis indicated the presence of diverse microbial phyla, including candidate phyla radiation (CPR). Statistical analysis, random forest regression, and supervised machine learning models classification confirm the diversity of the microbiome accurately. Furthermore, we have identified 450 cultures of heterotrophic, biotechnologically important bacteria. Some strains were identified as novel taxa based on 16S rRNA gene sequencing, showing promising potential for further study. CONCLUSION: Thus, our study provides valuable insights into the microbial diversity and pollution levels of coastal areas in AP and GA. These findings contribute to a better understanding of the impact of anthropogenic activities and climate variations on biology of coastal ecosystems and biodiversity.


Asunto(s)
Bacterias , Bahías , Microbiota , Filogenia , ARN Ribosómico 16S , Agua de Mar , Aprendizaje Automático Supervisado , ARN Ribosómico 16S/genética , Bacterias/clasificación , Bacterias/genética , Bacterias/aislamiento & purificación , Microbiota/genética , Agua de Mar/microbiología , India , Bahías/microbiología , Biodiversidad , ADN Bacteriano/genética , Salinidad , Análisis de Secuencia de ADN/métodos
12.
BMC Bioinformatics ; 25(1): 186, 2024 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-38730374

RESUMEN

BACKGROUND: Commonly used next generation sequencing machines typically produce large amounts of short reads of a few hundred base-pairs in length. However, many downstream applications would generally benefit from longer reads. RESULTS: We present CAREx-an algorithm for the generation of pseudo-long reads from paired-end short-read Illumina data based on the concept of repeatedly computing multiple-sequence-alignments to extend a read until its partner is found. Our performance evaluation on both simulated data and real data shows that CAREx is able to connect significantly more read pairs (up to 99 % for simulated data) and to produce more error-free pseudo-long reads than previous approaches. When used prior to assembly it can achieve superior de novo assembly results. Furthermore, the GPU-accelerated version of CAREx exhibits the fastest execution times among all tested tools. CONCLUSION: CAREx is a new MSA-based algorithm and software for producing pseudo-long reads from paired-end short read data. It outperforms other state-of-the-art programs in terms of (i) percentage of connected read pairs, (ii) reduction of error rates of filled gaps, (iii) runtime, and (iv) downstream analysis using de novo assembly. CAREx is open-source software written in C++ (CPU version) and in CUDA/C++ (GPU version). It is licensed under GPLv3 and can be downloaded at ( https://github.com/fkallen/CAREx ).


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Humanos , Alineación de Secuencia/métodos
13.
Sci Rep ; 14(1): 10687, 2024 05 09.
Artículo en Inglés | MEDLINE | ID: mdl-38724570

RESUMEN

This paper investigates the complexity of DNA sequences in maize and soybean using the multifractal detrended fluctuation analysis (MF-DFA) method, chaos game representation (CGR), and the complexity-entropy plane approach. The study aims to understand the patterns and structures of these DNA sequences, which can provide insights into their genetic makeup and improve crop yield and quality. The results show that maize and soybean DNA sequences exhibit fractal properties, indicating a complex and self-organizing structure. We observe the persistence trend between sequences of base pairs, which indicates long-range correlations between base pairs. We also identified the stochastic nature of the DNA sequences of both species.


Asunto(s)
ADN de Plantas , Glycine max , Zea mays , Zea mays/genética , Zea mays/crecimiento & desarrollo , Glycine max/genética , Glycine max/crecimiento & desarrollo , ADN de Plantas/genética , Fractales , Análisis de Secuencia de ADN/métodos
14.
Genome Biol ; 25(1): 130, 2024 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-38773520

RESUMEN

Bulk DNA sequencing of multiple samples from the same tumor is becoming common, yet most methods to infer copy-number aberrations (CNAs) from this data analyze individual samples independently. We introduce HATCHet2, an algorithm to identify haplotype- and clone-specific CNAs simultaneously from multiple bulk samples. HATCHet2 extends the earlier HATCHet method by improving identification of focal CNAs and introducing a novel statistic, the minor haplotype B-allele frequency (mhBAF), that enables identification of mirrored-subclonal CNAs. We demonstrate HATCHet2's improved accuracy using simulations and a single-cell sequencing dataset. HATCHet2 analysis of 10 prostate cancer patients reveals previously unreported mirrored-subclonal CNAs affecting cancer genes.


Asunto(s)
Algoritmos , Variaciones en el Número de Copia de ADN , Haplotipos , Neoplasias de la Próstata , Humanos , Neoplasias de la Próstata/genética , Masculino , Análisis de Secuencia de ADN/métodos , Neoplasias/genética , Frecuencia de los Genes , Análisis de la Célula Individual
15.
J Pharm Biomed Anal ; 245: 116180, 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-38703748

RESUMEN

Oligonucleotides have emerged as important therapeutic options for inherited diseases. In recent years, RNA therapeutics, especially mRNA, have been pushed to the market. Analytical methods for these molecules have been published extensively in the last few years. Notably, mass spectrometry has proven as a state-of-the-art quality control method. For RNA based therapeutics, numerous methods are available, while DNA therapeutics lack of suitable MS-based methods when it comes to molecules exceeding approximately 60 nucleotides. We present a method which combines the use of common restriction enzymes and short enzyme-directing oligonucleotides to generate DNA digestion products with the advantages of high-resolution tandem mass spectrometry. The instrumentation includes ion pair reverse phase chromatography coupled to a time-of-flight mass spectrometer with a collision induced dissociation (CID) for sequence analysis. Utilizing this approach, we increased the sequence coverage from 23.3% for a direct CID-MS/MS experiment of a 100 nucleotide DNA molecule to 100% sequence coverage using the restriction enzyme mediated approach presented in this work. This approach is suitable for research and development and quality control purposes in a regulated environment, which makes it a versatile tool for drug development.


Asunto(s)
Enzimas de Restricción del ADN , ADN , Oligonucleótidos , Espectrometría de Masas en Tándem , Espectrometría de Masas en Tándem/métodos , ADN/química , ADN/genética , Enzimas de Restricción del ADN/metabolismo , Oligonucleótidos/química , Nucleótidos/análisis , Nucleótidos/química , Cromatografía de Fase Inversa/métodos , Control de Calidad , Análisis de Secuencia de ADN/métodos
17.
BMC Bioinformatics ; 25(1): 191, 2024 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-38750423

RESUMEN

BACKGROUND: The application of reduced metagenomic sequencing approaches holds promise as a middle ground between targeted amplicon sequencing and whole metagenome sequencing approaches but has not been widely adopted as a technique. A major barrier to adoption is the lack of read simulation software built to handle characteristic features of these novel approaches. Reduced metagenomic sequencing (RMS) produces unique patterns of fragmentation per genome that are sensitive to restriction enzyme choice, and the non-uniform size selection of these fragments may introduce novel challenges to taxonomic assignment as well as relative abundance estimates. RESULTS: Through the development and application of simulation software, readsynth, we compare simulated metagenomic sequencing libraries with existing RMS data to assess the influence of multiple library preparation and sequencing steps on downstream analytical results. Based on read depth per position, readsynth achieved 0.79 Pearson's correlation and 0.94 Spearman's correlation to these benchmarks. Application of a novel estimation approach, fixed length taxonomic ratios, improved quantification accuracy of simulated human gut microbial communities when compared to estimates of mean or median coverage. CONCLUSIONS: We investigate the possible strengths and weaknesses of applying the RMS technique to profiling microbial communities via simulations with readsynth. The choice of restriction enzymes and size selection steps in library prep are non-trivial decisions that bias downstream profiling and quantification. The simulations investigated in this study illustrate the possible limits of preparing metagenomic libraries with a reduced representation sequencing approach, but also allow for the development of strategies for producing and handling the sequence data produced by this promising application.


Asunto(s)
Metagenoma , Metagenómica , Programas Informáticos , Metagenoma/genética , Metagenómica/métodos , Humanos , Análisis de Secuencia de ADN/métodos , Microbioma Gastrointestinal/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos
18.
Bioinformatics ; 40(5)2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38696761

RESUMEN

SUMMARY: PlasCAT (Plasmid Cloud Assembly Tool) is an easy-to-use cloud-based bioinformatics tool that enables de novo plasmid sequence assembly from raw sequencing data. Nontechnical users can now assemble sequences from long reads and short reads without ever touching a line of code. PlasCAT uses high-performance computing servers to reduce run times on assemblies and deliver results faster. AVAILABILITY AND IMPLEMENTATION: PlasCAT is freely available on the web at https://sequencing.genofab.com. The assembly pipeline source code and server code are available for download at https://bitbucket.org/genofabinc/workspace/projects/PLASCAT. Click the Cancel button to access the source code without authenticating. Web servers implemented in React.js and Python, with all major browsers supported.


Asunto(s)
Plásmidos , Programas Informáticos , Plásmidos/genética , Nube Computacional , Biología Computacional/métodos , Análisis de Secuencia de ADN/métodos , Internet
19.
Plant Sci ; 344: 112109, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38704094

RESUMEN

Advances in next-generation sequencing (NGS) have significantly reduced the cost and improved the efficiency of obtaining single nucleotide polymorphism (SNP) markers, particularly through restriction site-associated DNA sequencing (RAD-seq). Meanwhile, the progression in whole genome sequencing has led to the utilization of an increasing number of reference genomes in SNP calling processes. This study utilized RAD-seq data from 242 individuals of Engelhardia roxburghiana, a tropical tree of the walnut family (Juglandaceae), with SNP calling conducted using the STACKS pipeline. We aimed to compare both reference-based approaches, namely, employing a closely related species as the reference genome versus the species itself as the reference genome, to evaluate their respective merits and limitations. Our findings indicate a substantial discrepancy in the number of obtained SNPs between using a closely related species as opposed to the species itself as reference genomes, the former yielded approximately an order of magnitude fewer SNPs compared to the latter. While the missing rate of individuals and sites of the final SNPs obtained in the two scenarios showed no significant difference. The results showed that using the reference genome of the species itself tends to be prioritized in RAD-seq studies. However, if this is unavailable, considering closely related genomes is feasible due to their wide applicability and low missing rate as alternatives. This study contributes to enrich the understanding of the impact of SNP acquisition when utilizing different reference genomes.


Asunto(s)
Genoma de Planta , Secuenciación de Nucleótidos de Alto Rendimiento , Polimorfismo de Nucleótido Simple , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
20.
Microb Genom ; 10(5)2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38717808

RESUMEN

Improvements in the accuracy and availability of long-read sequencing mean that complete bacterial genomes are now routinely reconstructed using hybrid (i.e. short- and long-reads) assembly approaches. Complete genomes allow a deeper understanding of bacterial evolution and genomic variation beyond single nucleotide variants. They are also crucial for identifying plasmids, which often carry medically significant antimicrobial resistance genes. However, small plasmids are often missed or misassembled by long-read assembly algorithms. Here, we present Hybracter which allows for the fast, automatic and scalable recovery of near-perfect complete bacterial genomes using a long-read first assembly approach. Hybracter can be run either as a hybrid assembler or as a long-read only assembler. We compared Hybracter to existing automated hybrid and long-read only assembly tools using a diverse panel of samples of varying levels of long-read accuracy with manually curated ground truth reference genomes. We demonstrate that Hybracter as a hybrid assembler is more accurate and faster than the existing gold standard automated hybrid assembler Unicycler. We also show that Hybracter with long-reads only is the most accurate long-read only assembler and is comparable to hybrid methods in accurately recovering small plasmids.


Asunto(s)
Algoritmos , Genoma Bacteriano , Programas Informáticos , Plásmidos/genética , Análisis de Secuencia de ADN/métodos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Bacterias/genética , Bacterias/clasificación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA