Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 555
Filtrar
1.
Viruses ; 14(2)2022 01 30.
Artículo en Inglés | MEDLINE | ID: mdl-35215887

RESUMEN

Omicron, the novel highly mutated SARS-CoV-2 Variant of Concern (VOC, Pango lineage B.1.1.529) was first collected in early November 2021 in South Africa. By the end of November 2021, it had spread and approached fixation in South Africa, and had been detected on all continents. We analyzed the exponential growth of Omicron over four-week periods in the two most populated of South Africa's provinces, Gauteng and KwaZulu-Natal, arriving at the doubling time estimates of, respectively, 3.3 days (95% CI: 3.2-3.4 days) and 2.7 days (95% CI: 2.3-3.3 days). Similar or even shorter doubling times were observed in other locations: Australia (3.0 days), New York State (2.5 days), UK (2.4 days), and Denmark (2.0 days). Log-linear regression suggests that the spread began in Gauteng around 11 October 2021; however, due to presumable stochasticity in the initial spread, this estimate can be inaccurate. Phylogenetics-based analysis indicates that the Omicron strain started to diverge between 6 October and 29 October 2021. We estimated that the weekly growth of the ratio of Omicron to Delta is in the range of 7.2-10.2, considerably higher than the growth of the ratio of Delta to Alpha (estimated to be in in the range of 2.5-4.2), and Alpha to pre-existing strains (estimated to be in the range of 1.8-2.7). High relative growth does not necessarily imply higher Omicron infectivity. A two-strain SEIR model suggests that the growth advantage of Omicron may stem from immune evasion, which permits this VOC to infect both recovered and fully vaccinated individuals. As we demonstrated within the model, immune evasion is more concerning than increased transmissibility, because it can facilitate larger epidemic outbreaks.


Asunto(s)
COVID-19/transmisión , Evasión Inmune , SARS-CoV-2/inmunología , SARS-CoV-2/fisiología , Replicación Viral/inmunología , Australia/epidemiología , COVID-19/epidemiología , Genoma Viral , Humanos , New York/epidemiología , Filogenia , SARS-CoV-2/genética , Análisis de Secuencia de ADN/estadística & datos numéricos , Sudáfrica/epidemiología , Factores de Tiempo
2.
Clin Epigenetics ; 14(1): 22, 2022 02 11.
Artículo en Inglés | MEDLINE | ID: mdl-35148810

RESUMEN

BACKGROUND: Multiple studies have reported the prognostic impact of DNA methylation changes in acute myeloid leukemia (AML). However, these epigenetic markers have not been thoroughly validated and therefore are still not considered in clinical practice. Hence, we aimed to independently verify results of selected studies describing the relationship between DNA methylation of specific genes and their prognostic potential in predicting overall survival (OS) and event-free survival (EFS). RESULTS: Fourteen studies (published 2011-2019) comprising of 27 genes were subjected to validation by a custom NGS-based sequencing panel in 178 newly diagnosed non-M3 AML patients treated by 3 + 7 induction regimen. The results were considered as successfully validated, if both the log-rank test and multivariate Cox regression analysis had a p-value ≤ 0.05. The predictive role of DNA methylation was confirmed for three studies comprising of four genes: CEBPA (OS: p = 0.02; EFS: p = 0.03), PBX3 (EFS: p = 0.01), LZTS2 (OS: p = 0.05; EFS: p = 0.0003), and NR6A1 (OS: p = 0.004; EFS: p = 0.0003). For all of these genes, higher methylation was an indicator of longer survival. Concurrent higher methylation of both LZTS2 and NR6A1 was highly significant for survival in cytogenetically normal (CN) AML group (OS: p < 0.0001; EFS: p < 0.0001) as well as for the whole AML cohort (OS: p = 0.01; EFS < 0.0001). In contrast, for two studies reporting the poor prognostic effect of higher GPX3 and DLX4 methylation, we found the exact opposite, again linking higher GPX3 (OS: p = 0.006; EFS: p < 0.0001) and DLX4 (OS: p = 0.03; EFS = 0.03) methylation to a favorable treatment outcome. Individual gene significance levels refer to the outcomes of multivariate Cox regression analysis. CONCLUSIONS: Out of twenty-seven genes subjected to DNA methylation validation, a prognostic role was observed for six genes. Therefore, independent validation studies are necessary to reveal truly prognostic DNA methylation changes and to enable the introduction of these promising epigenetic markers into clinical practice.


Asunto(s)
Biomarcadores de Tumor/análisis , Metilación de ADN/genética , Leucemia Mieloide Aguda/diagnóstico , Adulto , Biomarcadores de Tumor/genética , Metilación de ADN/fisiología , Femenino , Humanos , Inmunoquímica/métodos , Inmunoquímica/estadística & datos numéricos , Leucemia Mieloide Aguda/genética , Masculino , Persona de Mediana Edad , Pronóstico , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/estadística & datos numéricos , Factores de Transcripción/genética , Resultado del Tratamiento , Estudios de Validación como Asunto
3.
J Comput Biol ; 29(2): 155-168, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35108101

RESUMEN

k-mer-based methods are widely used in bioinformatics, but there are many gaps in our understanding of their statistical properties. Here, we consider the simple model where a sequence S (e.g., a genome or a read) undergoes a simple mutation process through which each nucleotide is mutated independently with some probability r, under the assumption that there are no spurious k-mer matches. How does this process affect the k-mers of S? We derive the expectation and variance of the number of mutated k-mers and of the number of islands (a maximal interval of mutated k-mers) and oceans (a maximal interval of nonmutated k-mers). We then derive hypothesis tests and confidence intervals (CIs) for r given an observed number of mutated k-mers, or, alternatively, given the Jaccard similarity (with or without MinHash). We demonstrate the usefulness of our results using a few select applications: obtaining a CI to supplement the Mash distance point estimate, filtering out reads during alignment by Minimap2, and rating long-read alignments to a de Bruijn graph by Jabba.


Asunto(s)
Mutación , Análisis de Secuencia de ADN/estadística & datos numéricos , Algoritmos , Secuencia de Bases , Biología Computacional , Intervalos de Confianza , Genómica/estadística & datos numéricos , Humanos , Modelos Genéticos , Alineación de Secuencia/estadística & datos numéricos , Programas Informáticos
4.
J Comput Biol ; 29(2): 169-187, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35041495

RESUMEN

Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to build the r-index efficiently via a technique called prefix-free parsing (PFP) and demonstrated its effectiveness for exact pattern matching. Exact pattern matching can be leveraged to support approximate pattern matching, but the r-index itself cannot support efficiently popular and important queries such as finding maximal exact matches (MEMs). To address this shortcoming, Bannai et al. introduced the concept of thresholds, and showed that storing them together with the r-index enables efficient MEM finding-but they did not say how to find those thresholds. We present a novel algorithm that applies PFP to build the r-index and find the thresholds simultaneously and in linear time and space with respect to the size of the prefix-free parse. Our implementation called MONI can rapidly find MEMs between reads and large-sequence collections of highly repetitive sequences. Compared with other read aligners-PuffAligner, Bowtie2, BWA-MEM, and CHIC- MONI used 2-11 times less memory and was 2-32 times faster for index construction. Moreover, MONI was less than one thousandth the size of competing indexes for large collections of human chromosomes. Thus, MONI represents a major advance in our ability to perform MEM finding against very large collections of related references.


Asunto(s)
Algoritmos , Genómica/estadística & datos numéricos , Alineación de Secuencia/estadística & datos numéricos , Programas Informáticos , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Genoma Bacteriano , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Salmonella/genética , Análisis de Secuencia de ADN/estadística & datos numéricos , Análisis de Ondículas
5.
J Comput Biol ; 29(2): 188-194, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35041518

RESUMEN

Efficiently finding maximal exact matches (MEMs) between a sequence read and a database of genomes is a key first step in read alignment. But until recently, it was unknown how to build a data structure in [Formula: see text] space that supports efficient MEM finding, where r is the number of runs in the Burrows-Wheeler Transform. In 2021, Rossi et al. showed how to build a small auxiliary data structure called thresholds in addition to the r-index in [Formula: see text] space. This addition enables efficient MEM finding using the r-index. In this article, we present the tool that implements this solution, which we call MONI. Namely, we give a high-level view of the main components of the data structure and show how the source code can be downloaded, compiled, and used to find MEMs between a set of sequence reads and a set of genomes.


Asunto(s)
Algoritmos , Alineación de Secuencia/estadística & datos numéricos , Programas Informáticos , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Genoma Humano , Genómica/estadística & datos numéricos , Humanos , Análisis de Secuencia de ADN/estadística & datos numéricos
6.
J Comput Biol ; 29(2): 195-211, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35041529

RESUMEN

Resolving haplotypes in polyploid genomes using phase information from sequencing reads is an important and challenging problem. We introduce two new mathematical formulations of polyploid haplotype phasing: (1) the min-sum max tree partition problem, which is a more flexible graphical metric compared with the standard minimum error correction (MEC) model in the polyploid setting, and (2) the uniform probabilistic error minimization model, which is a probabilistic analogue of the MEC model. We incorporate both formulations into a long-read based polyploid haplotype phasing method called flopp. We show that flopp compares favorably with state-of-the-art algorithms-up to 30 times faster with 2 times fewer switch errors on 6 × ploidy simulated data. Further, we show using real nanopore data that flopp can quickly reveal reasonable haplotype structures from the autotetraploid Solanum tuberosum (potato).


Asunto(s)
Algoritmos , Haplotipos , Poliploidía , Biología Computacional , Simulación por Computador , Bases de Datos Genéticas/estadística & datos numéricos , Genoma de Planta , Modelos Genéticos , Modelos Estadísticos , Familia de Multigenes , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/estadística & datos numéricos , Programas Informáticos , Solanum tuberosum/genética
7.
Nurs Res ; 71(1): 43-53, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34985847

RESUMEN

BACKGROUND: Nurse researchers are well poised to study the connection of the microbiome to health and disease. Evaluating published microbiome results can assist with study design and hypothesis generation. OBJECTIVES: This article aims to present and define important analysis considerations in microbiome study planning and to identify genera shared across studies despite methodological differences. This methods article will highlight a workflow that the nurse scientist can use to combine and evaluate taxonomy tables for microbiome study or research proposal planning. METHODS: We compiled taxonomy tables from 13 published gut microbiome studies that had used Ion Torrent sequencing technology. We searched for studies that had amplified multiple hypervariable (V) regions of the 16S rRNA gene when sequencing the bacteria from healthy gut samples. RESULTS: We obtained 15 taxonomy tables from the 13 studies, comprised of samples from four continents and eight V regions. Methodology among studies was highly variable, including differences in V regions amplified, geographic location, and population demographics. Nevertheless, of the 354 total genera identified from the 15 data sets, 25 were shared in all V regions and the four continents. When relative abundance differences across the V regions were compared, Dorea and Roseburia were statistically different. Taxonomy tables from Asian subjects had increased average abundances of Prevotella and lowered abundances of Bacteroides compared with the European, North American, and South American study subjects. DISCUSSION: Evaluating taxonomy tables from previously published literature is essential for study planning. The genera found from different V regions and continents highlight geography and V region as important variables to consider in microbiome study design. The 25 shared genera across the various studies may represent genera commonly found in healthy gut microbiomes. Understanding the factors that may affect the results from a variety of microbiome studies will allow nurse scientists to plan research proposals in an informed manner. This work presents a valuable framework for future cross-study comparisons conducted across the globe.


Asunto(s)
Clasificación/métodos , Microbioma Gastrointestinal/fisiología , Microbioma Gastrointestinal/inmunología , Salud Global/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/estadística & datos numéricos
8.
Clin. biomed. res ; 42(3): 218-225, 2022.
Artículo en Inglés | LILACS | ID: biblio-1415205

RESUMEN

Introduction: Dried blood spot (DBS) samples have been used for diagnostic purposes since their introduction in the neonatal screening of phenylketonuria almost 50 years ago. The range of its application has been extended to modern approaches, such as next-generation sequencing (NGS) for molecular genetic testing. This study aimed to evaluate the use of a standardized organic method for DNA extraction from DBS samples in the diagnostic setting.Methods: The clinical applicability of the method was tested using 3 samples collected from a newborn screening project for lysosomal storage diseases, allowing the determination of the genotype of the individuals. DNA was extracted from 3 3-mm diameter DBS punches. Quality, purity, and concentration were determined, and method performance was assessed by standard polymerase chain reaction, restriction length polymorphism, Sanger sequencing, and targeted NGS.Results: Results were compared with the ones obtained from DNA samples extracted following the internally validated in-house extraction protocol that used 6 3-mm punches of DBS and samples extracted from whole blood.Conclusion: This organic method proved to be effective in obtaining high-quality DNA from DBS, being compatible with several downstream molecular applications, in addition to having a lower cost per sample


Asunto(s)
Humanos , Recién Nacido , Reacción en Cadena de la Polimerasa/estadística & datos numéricos , Tamizaje Neonatal , Análisis de Secuencia de ADN/estadística & datos numéricos , ADN/genética , Pruebas con Sangre Seca/estadística & datos numéricos
9.
Nat Protoc ; 16(12): 5673-5706, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34773120

RESUMEN

Precise control of gene expression requires the coordinated action of multiple factors at cis-regulatory elements. We recently developed single-molecule footprinting to simultaneously resolve the occupancy of multiple proteins including transcription factors, RNA polymerase II and nucleosomes on single DNA molecules genome-wide. The technique combines the use of cytosine methyltransferases to footprint the genome with bisulfite sequencing to resolve transcription factor binding patterns at cis-regulatory elements. DNA footprinting is performed by incubating permeabilized nuclei with recombinant methyltransferases. Upon DNA extraction, whole-genome or targeted bisulfite libraries are prepared and loaded on Illumina sequencers. The protocol can be completed in 4-5 d in any laboratory with access to high-throughput sequencing. Analysis can be performed in 2 d using a dedicated R package and requires access to a high-performance computing system. Our method can be used to analyze how transcription factors cooperate and antagonize to regulate transcription.


Asunto(s)
Huella de ADN/métodos , Metilasas de Modificación del ADN/metabolismo , ADN/metabolismo , Genoma , Imagen Individual de Molécula/métodos , Factores de Transcripción/metabolismo , Animales , Núcleo Celular/metabolismo , ADN/genética , Metilasas de Modificación del ADN/genética , Regulación de la Expresión Génica , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Células Madre Embrionarias de Ratones/citología , Células Madre Embrionarias de Ratones/metabolismo , Nucleosomas/química , Nucleosomas/metabolismo , ARN Polimerasa II/genética , ARN Polimerasa II/metabolismo , Análisis de Secuencia de ADN/estadística & datos numéricos , Programas Informáticos , Factores de Transcripción/genética
10.
Clin Epigenetics ; 13(1): 196, 2021 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-34670587

RESUMEN

BACKGROUND: DNA methylation detection in liquid biopsies provides a highly promising and much needed means for real-time monitoring of disease load in advanced cancer patient care. Compared to the often-used somatic mutations, tissue- and cancer-type specific epigenetic marks affect a larger part of the cancer genome and generally have a high penetrance throughout the tumour. Here, we describe the successful application of the recently described MeD-seq assay for genome-wide DNA methylation profiling on cell-free DNA (cfDNA). The compatibility of the MeD-seq assay with different types of blood collection tubes, cfDNA input amounts, cfDNA isolation methods, and vacuum concentration of samples was evaluated using plasma from both metastatic cancer patients and healthy blood donors (HBDs). To investigate the potential value of cfDNA methylation profiling for tumour load monitoring, we profiled paired samples from 8 patients with resectable colorectal liver metastases (CRLM) before and after surgery. RESULTS: The MeD-seq assay worked on plasma-derived cfDNA from both EDTA and CellSave blood collection tubes when at least 10 ng of cfDNA was used. From the 3 evaluated cfDNA isolation methods, both the manual QIAamp Circulating Nucleic Acid Kit (Qiagen) and the semi-automated Maxwell® RSC ccfDNA Plasma Kit (Promega) were compatible with MeD-seq analysis, whereas the QiaSymphony DSP Circulating DNA Kit (Qiagen) yielded significantly fewer reads when compared to the QIAamp kit (p < 0.001). Vacuum concentration of samples before MeD-seq analysis was possible with samples in AVE buffer (QIAamp) or water, but yielded inconsistent results for samples in EDTA-containing Maxwell buffer. Principal component analysis showed that pre-surgical samples from CRLM patients were very distinct from HBDs, whereas post-surgical samples were more similar. Several described methylation markers for colorectal cancer monitoring in liquid biopsies showed differential methylation between pre-surgical CRLM samples and HBDs in our data, supporting the validity of our approach. Results for MSC, ITGA4, GRIA4, and EYA4 were validated by quantitative methylation specific PCR. CONCLUSIONS: The MeD-seq assay provides a promising new method for cfDNA methylation profiling. Potential future applications of the assay include marker discovery specifically for liquid biopsy analysis as well as direct use as a disease load monitoring tool in advanced cancer patients.


Asunto(s)
Ácidos Nucleicos Libres de Células/análisis , Metilación de ADN/genética , Ácidos Nucleicos Libres de Células/genética , Metilación de ADN/fisiología , Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Humanos , Biopsia Líquida/métodos , Biopsia Líquida/estadística & datos numéricos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/estadística & datos numéricos
11.
Genes (Basel) ; 12(9)2021 08 24.
Artículo en Inglés | MEDLINE | ID: mdl-34573280

RESUMEN

Inborn errors of immunity (IEI) include a large group of inherited diseases sharing either poor, dysregulated, or absent and/or acquired function in one or more components of the immune system. Next-generation sequencing (NGS) has driven a rapid increase in the recognition of such defects, though the wide heterogeneity of genetically diverse but phenotypically overlapping diseases has often prevented the molecular characterization of the most complex patients. Two hundred and seventy-two patients were submitted to three successive NGS-based gene panels composed of 58, 146, and 312 genes. Along with pathogenic and likely pathogenic causative gene variants, accounting for the corresponding disorders (37/272 patients, 13.6%), a number of either rare (probably) damaging variants in genes unrelated to patients' phenotype, variants of unknown significance (VUS) in genes consistent with their clinics, or apparently inconsistent benign, likely benign, or VUS variants were also detected. Finally, a remarkable amount of yet unreported variants of unknown significance were also found, often recurring in our dataset. The NGS approach demonstrated an expected IEI diagnostic rate. However, defining the appropriate list of genes for these panels may not be straightforward, and the application of unbiased approaches should be taken into consideration, especially when patients show atypical clinical pictures.


Asunto(s)
Frecuencia de los Genes , Enfermedades del Sistema Inmune/genética , Errores Innatos del Metabolismo/genética , Adolescente , Femenino , Interacción Gen-Ambiente , Pruebas Genéticas/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Enfermedades del Sistema Inmune/diagnóstico , Masculino , Errores Innatos del Metabolismo/diagnóstico , Mutación , Análisis de Secuencia de ADN/estadística & datos numéricos
12.
J Trauma Acute Care Surg ; 91(6): 988-994, 2021 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-34510074

RESUMEN

BACKGROUND: Timely recognition of sepsis and identification of pathogens can improve outcomes in critical care patients but microbial cultures have low accuracy and long turnaround times. In this proof-of-principle study, we describe metagenomic sequencing and analysis of nonhuman DNA in plasma. We hypothesized that quantitative analysis of bacterial DNA (bDNA) levels in plasma can enable detection and monitoring of pathogens. METHODS: We enrolled 30 patients suspected of sepsis in the surgical trauma intensive care unit and collected plasma samples at the time of diagnostic workup for sepsis (baseline), and 7 days and 14 days later. We performed metagenomic sequencing of plasma DNA and used computational classification of sequencing reads to detect and quantify total and pathogen-specific bDNA fraction. To improve assay sensitivity, we developed an enrichment method for bDNA based on size selection for shorter fragment lengths. Differences in bDNA fractions between samples were evaluated using t test and linear mixed-effects model, following log transformation. RESULTS: We analyzed 72 plasma samples from 30 patients. Twenty-seven samples (37.5%) were collected at the time of infection. Median total bDNA fraction was 1.6 times higher in these samples compared with samples with no infection (0.011% and 0.0068%, respectively, p < 0.001). In 17 patients who had active infection at enrollment and at least one follow-up sample collected, total bDNA fractions were higher at baseline compared with the next sample (p < 0.001). Following enrichment, bDNA fractions increased in paired samples by a mean of 16.9-fold. Of 17 samples collected at the time when bacterial pathogens were identified, we detected pathogen-specific DNA in 13 plasma samples (76.5%). CONCLUSION: Bacterial DNA levels in plasma are elevated in critically ill patients with active infection. Pathogen-specific DNA is detectable in plasma, particularly after enrichment using selection for shorter fragments. Serial changes in bDNA levels may be informative of treatment response. LEVEL OF EVIDENCE: Epidemiologic/Prognostic, Level V.


Asunto(s)
Bacterias , ADN Bacteriano , Metagenómica/métodos , Sepsis , Análisis de Secuencia de ADN , Bacterias/clasificación , Bacterias/genética , Bacterias/aislamiento & purificación , Cuidados Críticos/métodos , Cuidados Críticos/normas , Enfermedad Crítica/terapia , ADN Bacteriano/sangre , ADN Bacteriano/aislamiento & purificación , Humanos , Unidades de Cuidados Intensivos/estadística & datos numéricos , Prueba de Estudio Conceptual , Mejoramiento de la Calidad , Reproducibilidad de los Resultados , Sepsis/diagnóstico , Sepsis/microbiología , Sepsis/terapia , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/estadística & datos numéricos
13.
PLoS Comput Biol ; 17(8): e1009254, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34343164

RESUMEN

Driven by the necessity to survive environmental pathogens, the human immune system has evolved exceptional diversity and plasticity, to which several factors contribute including inheritable structural polymorphism of the underlying genes. Characterizing this variation is challenging due to the complexity of these loci, which contain extensive regions of paralogy, segmental duplication and high copy-number repeats, but recent progress in long-read sequencing and optical mapping techniques suggests this problem may now be tractable. Here we assess this by using long-read sequencing platforms from PacBio and Oxford Nanopore, supplemented with short-read sequencing and Bionano optical mapping, to sequence DNA extracted from CD14+ monocytes and peripheral blood mononuclear cells from a single European individual identified as HV31. We use this data to build a de novo assembly of eight genomic regions encoding four key components of the immune system, namely the human leukocyte antigen, immunoglobulins, T cell receptors, and killer-cell immunoglobulin-like receptors. Validation of our assembly using k-mer based and alignment approaches suggests that it has high accuracy, with estimated base-level error rates below 1 in 10 kb, although we identify a small number of remaining structural errors. We use the assembly to identify heterozygous and homozygous structural variation in comparison to GRCh38. Despite analyzing only a single individual, we find multiple large structural variants affecting core genes at all three immunoglobulin regions and at two of the three T cell receptor regions. Several of these variants are not accurately callable using current algorithms, implying that further methodological improvements are needed. Our results demonstrate that assessing haplotype variation in these regions is possible given sufficiently accurate long-read and associated data. Continued reductions in the cost of these technologies will enable application of these methods to larger samples and provide a broader catalogue of germline structural variation at these loci, an important step toward making these regions accessible to large-scale genetic association studies.


Asunto(s)
Variación Genética , Genoma Humano/inmunología , Sistema Inmunológico , Algoritmos , Biología Computacional , Variaciones en el Número de Copia de ADN , Genómica/métodos , Genómica/estadística & datos numéricos , Antígenos HLA/genética , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Fenómenos Inmunogenéticos , Inmunoglobulinas/genética , Receptores de Antígenos de Linfocitos T/genética , Receptores KIR/genética , Análisis de Secuencia de ADN/estadística & datos numéricos
14.
Life Sci Alliance ; 4(11)2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34462322

RESUMEN

More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to know if quality features are relevant in all experimental conditions. Therefore, the NGS community would highly benefit from condition-specific data-driven guidelines derived from many publicly available experiments, which reflect routinely generated NGS data. In this work, we have characterized well-known quality guidelines and related features in big datasets and concluded that they are too limited for assessing the quality of a given NGS file accurately. Therefore, we present new data-driven guidelines derived from the statistical analysis of many public datasets using quality features calculated by common bioinformatics tools. Thanks to this approach, we confirm the high relevance of genome mapping statistics to assess the quality of the data, and we demonstrate the limited scope of some quality features that are not relevant in all conditions. Our guidelines are available at https://cbdm.uni-mainz.de/ngs-guidelines.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Análisis de Secuencia de ADN/métodos , Biología Computacional/métodos , Genoma Humano , Humanos , Control de Calidad , Análisis de Secuencia de ADN/estadística & datos numéricos , Programas Informáticos
15.
mBio ; 12(4): e0163821, 2021 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-34399612

RESUMEN

RNA viruses cause numerous emerging diseases, mostly due to transmission from mammalian and avian reservoirs. Large-scale surveillance of RNA viral infections in these animals is a fundamental step for controlling viral infectious diseases. Metagenomic analysis is a powerful method for virus identification with low bias and has contributed substantially to the discovery of novel viruses. Deep-sequencing data have been collected from diverse animals and accumulated in public databases, which can be valuable resources for identifying unknown viral sequences. Here, we screened for infections of 33 RNA viral families in publicly available mammalian and avian sequencing data and found approximately 900 hidden viral infections. We also discovered six nearly complete viral genomes in livestock, wild, and experimental animals: hepatovirus in a goat, hepeviruses in blind mole-rats and a galago, astrovirus in macaque monkeys, parechovirus in a cow, and pegivirus in tree shrews. Some of these viruses were phylogenetically close to human-pathogenic viruses, suggesting the potential risk of causing disease in humans upon infection. Furthermore, infections of five novel viruses were identified in several different individuals, indicating that their infections may have already spread in the natural host population. Our findings demonstrate the reusability of public sequencing data for surveying viral infections and identifying novel viral sequences, presenting a warning about a new threat of viral infectious disease to public health. IMPORTANCE Monitoring the spread of viral infections and identifying novel viruses capable of infecting humans through animal reservoirs are necessary to control emerging viral diseases. Massive amounts of sequencing data collected from various animals are publicly available, and these data may contain sequences originating from a wide variety of viruses. Here, we analyzed more than 46,000 public sequencing data and identified approximately 900 hidden RNA viral infections in mammalian and avian samples. Some viruses discovered in this study were genetically similar to pathogens that cause hepatitis, diarrhea, or encephalitis in humans, suggesting the presence of new threats to public health. Our study demonstrates the effectiveness of reusing public sequencing data to identify known and unknown viral infections, indicating that future continuous monitoring of public sequencing data by metagenomic analyses would help prepare and mitigate future viral pandemics.


Asunto(s)
Enfermedades Transmisibles Emergentes/virología , Metagenómica , Infecciones por Virus ARN/prevención & control , Virus ARN/genética , Virus ARN/patogenicidad , Análisis de Secuencia de ADN/estadística & datos numéricos , Animales , Aves/virología , Bovinos , Análisis de Datos , Genoma Viral , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Infecciones por Virus ARN/virología , Virus ARN/clasificación , Análisis de Secuencia de ADN/métodos
16.
Artículo en Inglés | MEDLINE | ID: mdl-34344265

RESUMEN

In this article, we study the statistical characteristics and examine the performance of original representation and mathematical modelling of deoxyribonucleic acid (DNA) sequences. The proposed mathematical modelling approach is presented to create closed formulas for the original DNA data sequences with different methods. Accuracy of representation is studied based on evaluation metric values. The root Mean Squared Error (RMSE) and correlation coefficient (R) are used for examining the accuracy of all mathematical models to select the optimum one for DNA representation. In addition, statistical parameters such as energy, entropy, standard deviation, variance, mean, range, Mean Absolute Deviation (MAD), skewness and kurtosis are also used for the selection of the optimum model for DNA representation. Finally, spectral estimation methods are used for exon prediction, which means determination of the coding region (exon) for actual sequences and selected mathematical model: Sum of Sinusoids (SoS) with 8 terms and Gaussian with 8 terms. The exon prediction results from original DNA sequences and mathematically modelled DNA sequences coincide and ensure the success of the proposed sum-of--sinusoids for modelling of DNA sequences, while the Gaussian model is not appropriate for this task.


Asunto(s)
ADN/química , Análisis de Secuencia de ADN/estadística & datos numéricos , Secuencia de Bases , Bases de Datos de Ácidos Nucleicos , Exones/genética , Modelos Estadísticos
17.
Nucleic Acids Res ; 49(19): e109, 2021 11 08.
Artículo en Inglés | MEDLINE | ID: mdl-34320181

RESUMEN

Whole genome bisulphite sequencing (WGBS) permits the genome-wide study of single molecule methylation patterns. One of the key goals of mammalian cell-type identity studies, in both normal differentiation and disease, is to locate differential methylation patterns across the genome. We discuss the most desirable characteristics for DML (differentially methylated locus) and DMR (differentially methylated region) detection tools in a genome-wide context and choose a set of statistical methods that fully or partially satisfy these considerations to compare for benchmarking. Our data simulation strategy is both biologically informed-employing distribution parameters derived from large-scale consortium datasets-and thorough. We report DML detection ability with respect to coverage, group methylation difference, sample size, variability and covariate size, both marginally and jointly, and exhaustively with respect to parameter combination. We also benchmark these methods on FDR control and computational time. We use this result to backend and introduce an expanded version of DMRcate: an existing DMR detection tool for microarray data that we have extended to now call DMRs from WGBS data. We compare DMRcate to a set of alternative DMR callers using a similarly realistic simulation strategy. We find DMRcate and RADmeth are the best predictors of DMRs, and conclusively find DMRcate the fastest.


Asunto(s)
Metilación de ADN , ADN/metabolismo , Epigénesis Genética , Genoma Humano , Análisis de Secuencia de ADN/estadística & datos numéricos , Benchmarking , Simulación por Computador , Islas de CpG , ADN/genética , Genómica/métodos , Humanos , Tamaño de la Muestra , Sulfitos/química , Secuenciación Completa del Genoma
18.
BMC Pregnancy Childbirth ; 21(1): 496, 2021 Jul 08.
Artículo en Inglés | MEDLINE | ID: mdl-34238233

RESUMEN

BACKGROUND: We aimed to evaluate the clinical value of copy number variation-sequencing (CNV-Seq) in combination with cytogenetic karyotyping in prenatal diagnosis. METHODS: CNV-Seq and cytogenetic karyotyping were performed in parallel for 9452 prenatal samples for comparison of the diagnostic performance of the two methods, and to evaluate the screening performance of maternal age, maternal serum screening, fetal ultrasound scanning and noninvasive prenatal testing (NIPT) for fetal pathogenic copy number variation (CNV). RESULTS: Among the 9452 prenatal samples, traditional karyotyping detected 704 cases (7.5%) of abnormal cytogenetic karyotypes, 171 (1.8%) chromosome polymorphism, 20 (0.2%) subtle structural variations, 74 (0.7%) mutual translocation (possibly balanced), 52 (0.6%) without karyotyping results, and 8431 (89.2%) normal cytogenetic karyotypes. Among the 8705 cases with normal karyotype, polymorphism, mutual translocation, or marker chromosome, CNV-Seq detected 63 cases (0.7%) of pathogenic chromosome microdeletion/duplication. Retrospectively, noninvasive prenatal testing (NIPT) had high sensitivity and specificity for the screening of fetal pathogenic CNV, and NIPT combining with maternal age, maternal serum screening or fetal ultrasound scanning, which improved the screening performance. CONCLUSION: The combined application of cytogenetic karyotyping and CNV-Seq significantly improved the detection rate of fetal pathogenic chromosome microdeletion/duplication. NIPT was recommended for the screening of pathogenic chromosome microdeletion/duplication, and NIPT combining with other screening methods further improved the screening performance for pathogenic fetal CNV.


Asunto(s)
Trastornos de los Cromosomas/diagnóstico , Variaciones en el Número de Copia de ADN , Cariotipificación/estadística & datos numéricos , Diagnóstico Prenatal/estadística & datos numéricos , Análisis de Secuencia de ADN/estadística & datos numéricos , Adulto , Trastornos de los Cromosomas/embriología , Análisis Citogenético , Femenino , Humanos , Edad Materna , Pruebas de Detección del Suero Materno/estadística & datos numéricos , Pruebas Prenatales no Invasivas/métodos , Pruebas Prenatales no Invasivas/estadística & datos numéricos , Embarazo , Diagnóstico Prenatal/métodos , Reproducibilidad de los Resultados , Estudios Retrospectivos , Sensibilidad y Especificidad , Ultrasonografía Prenatal/estadística & datos numéricos
19.
Comput Math Methods Med ; 2021: 1835056, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34306171

RESUMEN

In a general computational context for biomedical data analysis, DNA sequence classification is a crucial challenge. Several machine learning techniques have used to complete this task in recent years successfully. Identification and classification of viruses are essential to avoid an outbreak like COVID-19. Regardless, the feature selection process remains the most challenging aspect of the issue. The most commonly used representations worsen the case of high dimensionality, and sequences lack explicit features. It also helps in detecting the effect of viruses and drug design. In recent days, deep learning (DL) models can automatically extract the features from the input. In this work, we employed CNN, CNN-LSTM, and CNN-Bidirectional LSTM architectures using Label and K-mer encoding for DNA sequence classification. The models are evaluated on different classification metrics. From the experimental results, the CNN and CNN-Bidirectional LSTM with K-mer encoding offers high accuracy with 93.16% and 93.13%, respectively, on testing data.


Asunto(s)
COVID-19/virología , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Redes Neurales de la Computación , SARS-CoV-2/genética , Análisis de Secuencia de ADN/estadística & datos numéricos , Secuencia de Bases , Biología Computacional , ADN Viral/clasificación , ADN Viral/genética , Bases de Datos de Ácidos Nucleicos/estadística & datos numéricos , Aprendizaje Profundo , Humanos , Pandemias , SARS-CoV-2/clasificación
20.
Microbiol Res ; 250: 126794, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34062342

RESUMEN

The study of endophytic bacteria in saline-alkali tolerant rice seeds is of great help to the follow-up study of saline-alkali tolerant mechanism and the exploitation of saline-alkali tolerant microbial resources. In this study, high-throughput sequencing technology based on the Illumina Miseq platform was used to reveal the "core microbiota" by examining the diversity and community structures of seed endophytic bacteria in saline-alkali tolerant rice grown under different salt concentrations and explore the effect of salt concentration on its endophytic bacteria. Here, 49 endophytic OTUs were found to coexist in all samples. At the phylum level, the dominant phyla were Proteobacteria (83.90 %-99.87 %). At the genus level, Pantoea (44.65-94.76 %) which represents the core microbiota in saline-alkali tolerant rice seeds, served as the dominant genus that coexisted in all samples tested. Through further analysis, we found that the abundance of Pantoea in saline-alkali tolerant rice seeds was positively proportional to the level of salt concentration. Overall, this study showed that the core microbiota of saline-alkali tolerant rice seeds is Pantoea, and the change of salt concentration is a key factor in the formation of endophytic bacteria in saline-alkali tolerant rice seeds.


Asunto(s)
Álcalis/metabolismo , Bacterias/genética , Endófitos/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Microbiota/genética , Oryza/microbiología , Semillas/microbiología , Bacterias/clasificación , Variación Genética , Oryza/metabolismo , Filogenia , Análisis de Secuencia de ADN/estadística & datos numéricos , Suelo/química , Microbiología del Suelo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...