Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32.002
Filtrar
Más filtros

Intervalo de año de publicación
1.
Nat Commun ; 15(1): 6710, 2024 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-39112481

RESUMEN

The demographical history of France remains largely understudied despite its central role toward understanding modern population structure across Western Europe. Here, by exploring publicly available Europe-wide genotype datasets together with the genomes of 3234 present-day and six newly sequenced medieval individuals from Northern France, we found extensive fine-scale population structure across Brittany and the downstream Loire basin and increased population differentiation between the northern and southern sides of the river Loire, associated with higher proportions of steppe vs. Neolithic-related ancestry. We also found increased allele sharing between individuals from Western Brittany and those associated with the Bell Beaker complex. Our results emphasise the need for investigating local populations to better understand the distribution of rare (putatively deleterious) variants across space and the importance of common genetic legacy in understanding the sharing of disease-related alleles between Brittany and people from western Britain and Ireland.


Asunto(s)
Genética de Población , Humanos , Francia , Genoma Humano/genética , Demografía , Variación Genética , Alelos , Genotipo , Historia Medieval , Europa (Continente)
2.
Artículo en Inglés | MEDLINE | ID: mdl-39142816

RESUMEN

Precisely defining and mapping all cytosine (C) positions and their clusters, known as CpG islands (CGIs), as well as their methylation status, are pivotal for genome-wide epigenetic studies, especially when population-centric reference genomes are ready for timely application. Here, we first align the two high-quality reference genomes, T2T-YAO and T2T-CHM13, from different ethnic backgrounds in a base-by-base fashion and compute their genome-wide density-defined and position-defined CGIs. Second, by mapping some representative genome-wide methylation data from selected organs onto the two genomes, we find that there are about 4.7%-5.8% sequence divergency of variable categories depending on quality cutoffs. Genes among the divergent sequences are mostly associated with neurological functions. Moreover, CGIs associated with the divergent sequences are significantly different with respect to CpG density and observed CpG/expected CpG (O/E) ratio between the two genomes. Finally, we find that the T2T-YAO genome not only has a greater CpG coverage than that of the T2T-CHM13 genome when whole-genome bisulfite sequencing (WGBS) data from the European and American populations are mapped to each reference, but also shows more hyper-methylated CpG sites as compared to the T2T-CHM13 genome. Our study suggests that future genome-wide epigenetic studies of the Chinese populations rely on both acquisition of high-quality methylation data and subsequent precision CGI mapping based on the Chinese T2T reference.


Asunto(s)
Islas de CpG , Metilación de ADN , Genoma Humano , Islas de CpG/genética , Metilación de ADN/genética , Humanos , Genoma Humano/genética , Mapeo Cromosómico/métodos
3.
Nat Commun ; 15(1): 6956, 2024 Aug 13.
Artículo en Inglés | MEDLINE | ID: mdl-39138168

RESUMEN

Structural variants (SVs) significantly contribute to human genome diversity and play a crucial role in precision medicine. Although advancements in single-molecule long-read sequencing offer a groundbreaking resource for SV detection, identifying SV breakpoints and sequences accurately and robustly remains challenging. We introduce VolcanoSV, an innovative hybrid SV detection pipeline that utilizes both a reference genome and local de novo assembly to generate a phased diploid assembly. VolcanoSV uses phased SNPs and unique k-mer similarity analysis, enabling precise haplotype-resolved SV discovery. VolcanoSV is adept at constructing comprehensive genetic maps encompassing SNPs, small indels, and all types of SVs, making it well-suited for human genomics studies. Our extensive experiments demonstrate that VolcanoSV surpasses state-of-the-art assembly-based tools in the detection of insertion and deletion SVs, exhibiting superior recall, precision, F1 scores, and genotype accuracy across a diverse range of datasets, including low-coverage (10x) datasets. VolcanoSV outperforms assembly-based tools in the identification of complex SVs, including translocations, duplications, and inversions, in both simulated and real cancer data. Moreover, VolcanoSV is robust to various evaluation parameters and accurately identifies breakpoints and SV sequences.


Asunto(s)
Diploidia , Genoma Humano , Variación Estructural del Genoma , Polimorfismo de Nucleótido Simple , Humanos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Haplotipos
4.
J Transl Med ; 22(1): 756, 2024 Aug 12.
Artículo en Inglés | MEDLINE | ID: mdl-39135093

RESUMEN

BACKGROUND: Decoding human genomic sequences requires comprehensive analysis of DNA sequence functionality. Through computational and experimental approaches, researchers have studied the genotype-phenotype relationship and generate important datasets that help unravel complicated genetic blueprints. Thus, the recently developed artificial intelligence methods can be used to interpret the functions of those DNA sequences. METHODS: This study explores the use of deep learning, particularly pre-trained genomic models like DNA_bert_6 and human_gpt2-v1, in interpreting and representing human genome sequences. Initially, we meticulously constructed multiple datasets linking genotypes and phenotypes to fine-tune those models for precise DNA sequence classification. Additionally, we evaluate the influence of sequence length on classification results and analyze the impact of feature extraction in the hidden layers of our model using the HERV dataset. To enhance our understanding of phenotype-specific patterns recognized by the model, we perform enrichment, pathogenicity and conservation analyzes of specific motifs in the human endogenous retrovirus (HERV) sequence with high average local representation weight (ALRW) scores. RESULTS: We have constructed multiple genotype-phenotype datasets displaying commendable classification performance in comparison with random genomic sequences, particularly in the HERV dataset, which achieved binary and multi-classification accuracies and F1 values exceeding 0.935 and 0.888, respectively. Notably, the fine-tuning of the HERV dataset not only improved our ability to identify and distinguish diverse information types within DNA sequences but also successfully identified specific motifs associated with neurological disorders and cancers in regions with high ALRW scores. Subsequent analysis of these motifs shed light on the adaptive responses of species to environmental pressures and their co-evolution with pathogens. CONCLUSIONS: These findings highlight the potential of pre-trained genomic models in learning DNA sequence representations, particularly when utilizing the HERV dataset, and provide valuable insights for future research endeavors. This study represents an innovative strategy that combines pre-trained genomic model representations with classical methods for analyzing the functionality of genome sequences, thereby promoting cross-fertilization between genomics and artificial intelligence.


Asunto(s)
Genoma Humano , Genómica , Fenotipo , Humanos , Genómica/métodos , Modelos Genéticos , Retrovirus Endógenos/genética , Aprendizaje Profundo , Genotipo
5.
Gigascience ; 132024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-39101783

RESUMEN

BACKGROUND: Visualization is an indispensable facet of genomic data analysis. Despite the abundance of specialized visualization tools, there remains a distinct need for tailored solutions. However, their implementation typically requires extensive programming expertise from bioinformaticians and software developers, especially when building interactive applications. Toolkits based on visualization grammars offer a more accessible, declarative way to author new visualizations. Yet, current grammar-based solutions fall short in adequately supporting the interactive analysis of large datasets with extensive sample collections, a pivotal task often encountered in cancer research. FINDINGS: We present GenomeSpy, a grammar-based toolkit for authoring tailored, interactive visualizations for genomic data analysis. By using combinatorial building blocks and a declarative language, users can implement new visualization designs easily and embed them in web pages or end-user-oriented applications. A distinctive element of GenomeSpy's architecture is its effective use of the graphics processing unit in all rendering, enabling a high frame rate and smoothly animated interactions, such as navigation within a genome. We demonstrate the utility of GenomeSpy by characterizing the genomic landscape of 753 ovarian cancer samples from patients in the DECIDER clinical trial. Our results expand the understanding of the genomic architecture in ovarian cancer, particularly the diversity of chromosomal instability. CONCLUSIONS: GenomeSpy is a visualization toolkit applicable to a wide range of tasks pertinent to genome analysis. It offers high flexibility and exceptional performance in interactive analysis. The toolkit is open source with an MIT license, implemented in JavaScript, and available at https://genomespy.app/.


Asunto(s)
Genómica , Programas Informáticos , Humanos , Genómica/métodos , Gráficos por Computador , Neoplasias/genética , Neoplasias Ováricas/genética , Genoma Humano , Interfaz Usuario-Computador , Femenino , Biología Computacional/métodos
6.
Hum Genomics ; 18(1): 86, 2024 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-39113147

RESUMEN

BACKGROUND: The international disclosure of Chinese human genetic data continues to be a contentious issue in China, generating public debates in both traditional and social media channels. Concerns have intensified after Chinese scientists' research on pangenome data was published in the prestigious journal Nature. METHODS: This study scrutinized microblogs posted on Weibo, a popular Chinese social media site, in the two months immediately following the publication (June 14, 2023-August 21, 2023). Content analysis was conducted to assess the nature of public responses, justifications for positive or negative attitudes, and the users' overall knowledge of how Chinese human genetic information is regulated and managed in China. RESULTS: Weibo users displayed contrasting attitudes towards the article's public disclose of pangenome research data, with 18% positive, 64% negative, and 18% neutral. Positive attitudes came primarily from verified government and media accounts, which praised the publication. In contrast, negative attitudes originated from individual users who were concerned about national security and health risks and often believed that the researchers have betrayed China. The benefits of data sharing highlighted in the commentaries included advancements in disease research and scientific progress. Approximately 16% of the microblogs indicated that Weibo users had misunderstood existing regulations and laws governing data sharing and stewardship. CONCLUSIONS: Based on the predominantly negative public attitudes toward scientific data sharing established by our study, we recommend enhanced outreach by scientists and scientific institutions to increase the public understanding of developments in genetic research, international data sharing, and associated regulations. Additionally, governmental agencies can alleviate public fears and concerns by being more transparent about their security reviews of international collaborative research involving Chinese human genetic data and its cross-border transfer.


Asunto(s)
Investigación Biomédica , Difusión de la Información , Opinión Pública , Medios de Comunicación Sociales , Humanos , China , Genoma Humano/genética , Pueblo Asiatico/genética
7.
Nat Commun ; 15(1): 5907, 2024 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-39003259

RESUMEN

Long-read sequencing technology has enabled variant detection in difficult-to-map regions of the genome and enabled rapid genetic diagnosis in clinical settings. Rapidly evolving third-generation sequencing platforms like Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are introducing newer platforms and data types. It has been demonstrated that variant calling methods based on deep neural networks can use local haplotyping information with long-reads to improve the genotyping accuracy. However, using local haplotype information creates an overhead as variant calling needs to be performed multiple times which ultimately makes it difficult to extend to new data types and platforms as they get introduced. In this work, we have developed a local haplotype approximate method that enables state-of-the-art variant calling performance with multiple sequencing platforms including PacBio Revio system, ONT R10.4 simplex and duplex data. This addition of local haplotype approximation simplifies long-read variant calling with DeepVariant.


Asunto(s)
Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Haplotipos/genética , Humanos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Polimorfismo de Nucleótido Simple , Genoma Humano , Algoritmos , Variación Genética , Redes Neurales de la Computación
8.
Nat Commun ; 15(1): 6158, 2024 Jul 22.
Artículo en Inglés | MEDLINE | ID: mdl-39039045

RESUMEN

Common and rare alleles are now being annotated across millions of human genomes, and omics technologies are increasingly being used to develop health and treatment recommendations. However, these alleles have not yet been systematically characterized relative to aerospace medicine. Here, we review published alleles naturally found in human cohorts that have a likely protective effect, which is linked to decreased cancer risk and improved bone, muscular, and cardiovascular health. Although some technical and ethical challenges remain, research into these protective mechanisms could translate into improved nutrition, exercise, and health recommendations for crew members during deep space missions.


Asunto(s)
Alelos , Medicina de Precisión , Vuelo Espacial , Humanos , Medicina de Precisión/métodos , Medicina Aeroespacial , Genoma Humano , Neoplasias/genética , Neoplasias/terapia
9.
Nat Med ; 30(7): 1905-1912, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38956197

RESUMEN

Clinical whole-genome sequencing (WGS) has been shown to deliver potential benefits to children with cancer and to alter treatment in high-risk patient groups. It remains unknown whether offering WGS to every child with suspected cancer can change patient management. We collected WGS variant calls and clinical and diagnostic information from 281 children (282 tumors) across two English units (n = 152 from a hematology center, n = 130 from a solid tumor center) where WGS had become a routine test. Our key finding was that variants uniquely attributable to WGS changed the management in ~7% (20 out of 282) of cases while providing additional disease-relevant findings, beyond standard-of-care molecular tests, in 108 instances for 83 (29%) cases. Furthermore, WGS faithfully reproduced every standard-of-care molecular test (n = 738) and revealed several previously unknown genomic features of childhood tumors. We show that WGS can be delivered as part of routine clinical care to children with suspected cancer and can change clinical management by delivering unexpected genomic insights. Our experience portrays WGS as a clinically impactful assay for routine practice, providing opportunities for assay consolidation and for delivery of molecularly informed patient care.


Asunto(s)
Neoplasias , Secuenciación Completa del Genoma , Humanos , Neoplasias/genética , Neoplasias/terapia , Neoplasias/diagnóstico , Niño , Masculino , Preescolar , Femenino , Adolescente , Lactante , Pruebas Genéticas/métodos , Genoma Humano/genética , Genómica/métodos , Recién Nacido
10.
Mol Biol Evol ; 41(7)2024 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-38995236

RESUMEN

Kazakh people, like many other populations that settled in Central Asia, demonstrate an array of mixed anthropological features of East Eurasian (EEA) and West Eurasian (WEA) populations, indicating a possible scenario of biological admixture between already differentiated EEA and WEA populations. However, their complex biological origin, genomic makeup, and genetic interaction with surrounding populations are not well understood. To decipher their genetic structure and population history, we conducted, to our knowledge, the first whole-genome sequencing study of Kazakhs residing in Xinjiang (KZK). We demonstrated that KZK derived their ancestries from 4 ancestral source populations: East Asian (∼39.7%), West Asian (∼28.6%), Siberian (∼23.6%), and South Asian (∼8.1%). The recognizable interactions of EEA and WEA ancestries in Kazakhs were dated back to the 15th century BCE. Kazakhs were genetically distinctive from the Uyghurs in terms of their overall genomic makeup, although the 2 populations were closely related in genetics, and both showed a substantial admixture of western and eastern peoples. Notably, we identified a considerable sex-biased admixture, with an excess of western males and eastern females contributing to the KZK gene pool. We further identified a set of genes that showed remarkable differentiation in KZK from the surrounding populations, including those associated with skin color (SLC24A5, OCA2), essential hypertension (HLA-DQB1), hypertension (MTHFR, SLC35F3), and neuron development (CNTNAP2). These results advance our understanding of the complex history of contacts between Western and Eastern Eurasians, especially those living or along the old Silk Road.


Asunto(s)
Pueblo Asiatico , Humanos , Masculino , Femenino , Pueblo Asiatico/genética , China , Genoma Humano , Secuenciación Completa del Genoma , Pueblo de Asia Central
11.
Nat Commun ; 15(1): 6139, 2024 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-39033140

RESUMEN

Cancer driver genes can undergo positive selection for various types of genetic alterations, including gain-of-function or loss-of-function mutations and copy number alterations (CNA). We investigated the landscape of different types of alterations affecting driver genes in 17,644 cancer exomes and genomes. We find that oncogenes may simultaneously exhibit signatures of positive selection and also negative selection in different gene segments, suggesting a method to identify additional tumor types where an oncogene is a driver or a vulnerability. Next, we characterize the landscape of CNA-dependent selection effects, revealing a general trend of increased positive selection on oncogene mutations not only upon CNA gains but also upon CNA deletions. Similarly, we observe a positive interaction between mutations and CNA gains in tumor suppressor genes. Thus, two-hit events involving point mutations and CNA are universally observed regardless of the type of CNA and may signal new therapeutic opportunities. An analysis with focus on the somatic CNA two-hit events can help identify additional driver genes relevant to a tumor type. By a global inference of point mutation and CNA selection signatures and interactions thereof across genes and tissues, we identify 9 evolutionary archetypes of driver genes, representing different mechanisms of (in)activation by genetic alterations.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genes Supresores de Tumor , Neoplasias , Oncogenes , Humanos , Oncogenes/genética , Variaciones en el Número de Copia de ADN/genética , Neoplasias/genética , Mutación , Mutación Puntual , Exoma/genética , Genoma Humano
12.
J Transl Med ; 22(1): 618, 2024 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-38961476

RESUMEN

BACKGROUND: Cell free DNA (cfDNA)-based assays hold great potential in detecting early cancer signals yet determining the tissue-of-origin (TOO) for cancer signals remains a challenging task. Here, we investigated the contribution of a methylation atlas to TOO detection in low depth cfDNA samples. METHODS: We constructed a tumor-specific methylation atlas (TSMA) using whole-genome bisulfite sequencing (WGBS) data from five types of tumor tissues (breast, colorectal, gastric, liver and lung cancer) and paired white blood cells (WBC). TSMA was used with a non-negative least square matrix factorization (NNLS) deconvolution algorithm to identify the abundance of tumor tissue types in a WGBS sample. We showed that TSMA worked well with tumor tissue but struggled with cfDNA samples due to the overwhelming amount of WBC-derived DNA. To construct a model for TOO, we adopted the multi-modal strategy and used as inputs the combination of deconvolution scores from TSMA with other features of cfDNA. RESULTS: Our final model comprised of a graph convolutional neural network using deconvolution scores and genome-wide methylation density features, which achieved an accuracy of 69% in a held-out validation dataset of 239 low-depth cfDNA samples. CONCLUSIONS: In conclusion, we have demonstrated that our TSMA in combination with other cfDNA features can improve TOO detection in low-depth cfDNA samples.


Asunto(s)
Metilación de ADN , Genoma Humano , Neoplasias , Redes Neurales de la Computación , Humanos , Metilación de ADN/genética , Neoplasias/genética , Neoplasias/sangre , Neoplasias/diagnóstico , Ácidos Nucleicos Libres de Células/sangre , Ácidos Nucleicos Libres de Células/genética , Especificidad de Órganos/genética , Algoritmos
13.
Mol Genet Genomics ; 299(1): 65, 2024 Jul 07.
Artículo en Inglés | MEDLINE | ID: mdl-38972030

RESUMEN

BACKGROUND: A large number of challenging medically relevant genes (CMRGs) are situated in complex or highly repetitive regions of the human genome, hindering comprehensive characterization of genetic variants using next-generation sequencing technologies. In this study, we employed long-read sequencing technology, extensively utilized in studying complex genomic regions, to characterize genetic alterations, including short variants (single nucleotide variants and short insertions and deletions) and copy number variations, in 370 CMRGs across 41 individuals from 19 global populations. RESULTS: Our analysis revealed high levels of genetic variants in CMRGs, with 68.73% exhibiting copy number variations and 65.20% containing short variants that may disrupt protein function across individuals. Such variants can influence pharmacogenomics, genetic disease susceptibility, and other clinical outcomes. We observed significant differences in CMRG variation across populations, with individuals of African ancestry harboring the highest number of copy number variants and short variants compared to samples from other continents. Notably, 15.79% to 33.96% of short variants were exclusively detectable through long-read sequencing. While the T2T-CHM13 reference genome significantly improved the assembly of CMRG regions, thereby facilitating variant detection in these regions, some regions still lacked resolution. CONCLUSION: Our results provide an important reference for future clinical and pharmacogenetic studies, highlighting the need for a comprehensive representation of global genetic diversity in the reference genome and improved variant calling techniques to fully resolve medically relevant genes.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Variaciones en el Número de Copia de ADN/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genoma Humano/genética , Polimorfismo de Nucleótido Simple/genética , Variación Genética/genética , Predisposición Genética a la Enfermedad , Genética de Población/métodos , Mutación INDEL
14.
Science ; 385(6705): eadi1768, 2024 Jul 12.
Artículo en Inglés | MEDLINE | ID: mdl-38991054

RESUMEN

Although it is well known that the ancestors of modern humans and Neanderthals admixed, the effects of gene flow on the Neanderthal genome are not well understood. We develop methods to estimate the amount of human-introgressed sequences in Neanderthals and apply it to whole-genome sequence data from 2000 modern humans and three Neanderthals. We estimate that Neanderthals have 2.5 to 3.7% human ancestry, and we leverage human-introgressed sequences in Neanderthals to revise estimates of Neanderthal ancestry in modern humans, show that Neanderthal population sizes were significantly smaller than previously estimated, and identify two distinct waves of modern human gene flow into Neanderthals. Our data provide insights into the genetic legacy of recurrent gene flow between modern humans and Neanderthals.


Asunto(s)
Flujo Génico , Genoma Humano , Hombre de Neandertal , Animales , Humanos , Introgresión Genética , Hombre de Neandertal/genética , Densidad de Población , Secuenciación Completa del Genoma , Extinción Biológica
15.
Genome Biol ; 25(1): 176, 2024 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-38965568

RESUMEN

Tandem repeats are frequent across the human genome, and variation in repeat length has been linked to a variety of traits. Recent improvements in long read sequencing technologies have the potential to greatly improve tandem repeat analysis, especially for long or complex repeats. Here, we introduce LongTR, which accurately genotypes tandem repeats from high-fidelity long reads available from both PacBio and Oxford Nanopore Technologies. LongTR is freely available at https://github.com/gymrek-lab/longtr and https://zenodo.org/doi/10.5281/zenodo.11403979 .


Asunto(s)
Variación Genética , Genoma Humano , Secuencias Repetidas en Tándem , Humanos , Programas Informáticos , Análisis de Secuencia de ADN/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nanoporos/métodos
16.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38966948

RESUMEN

Variants in cis-regulatory elements link the noncoding genome to human pathology; however, detailed analytic tools for understanding the association between cell-level brain pathology and noncoding variants are lacking. CWAS-Plus, adapted from a Python package for category-wide association testing (CWAS), enhances noncoding variant analysis by integrating both whole-genome sequencing (WGS) and user-provided functional data. With simplified parameter settings and an efficient multiple testing correction method, CWAS-Plus conducts the CWAS workflow 50 times faster than CWAS, making it more accessible and user-friendly for researchers. Here, we used a single-nuclei assay for transposase-accessible chromatin with sequencing to facilitate CWAS-guided noncoding variant analysis at cell-type-specific enhancers and promoters. Examining autism spectrum disorder WGS data (n = 7280), CWAS-Plus identified noncoding de novo variant associations in transcription factor binding sites within conserved loci. Independently, in Alzheimer's disease WGS data (n = 1087), CWAS-Plus detected rare noncoding variant associations in microglia-specific regulatory elements. These findings highlight CWAS-Plus's utility in genomic disorders and scalability for processing large-scale WGS data and in multiple-testing corrections. CWAS-Plus and its user manual are available at https://github.com/joonan-lab/cwas/ and https://cwas-plus.readthedocs.io/en/latest/, respectively.


Asunto(s)
Secuenciación Completa del Genoma , Humanos , Secuenciación Completa del Genoma/métodos , Enfermedad de Alzheimer/genética , Estudio de Asociación del Genoma Completo/métodos , Trastorno del Espectro Autista/genética , Variación Genética , Programas Informáticos , Cromatina/genética , Cromatina/metabolismo , Genoma Humano
17.
J Comput Biol ; 31(7): 616-637, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38990757

RESUMEN

Modern genomic datasets, like those generated under the 1000 Genome Project, contain millions of variants belonging to known haplotypes. Although these datasets are more representative than a single reference sequence and can alleviate issues like reference bias, they are significantly more computationally burdensome to work with, often involving large-indexed genome graph data structures for tasks such as read mapping. The construction, preprocessing, and mapping algorithms can require substantial computational resources depending on the size of these variant sets. Moreover, the accuracy of mapping algorithms has been shown to decrease when working with complete variant sets. Therefore, a drastically reduced set of variants that preserves important properties of the original set is desirable. This work provides a technique for finding a minimal subset of variants S such that for given parameters α and δ, all substrings up to length α in the haplotypes are guaranteed to be still alignable to the appropriate locations with either Hamming or edit distance at most δ, using only S. Our contributions include showing the NP-hardness and inapproximability of these optimization problems and providing Integer Linear Programming (ILP) formulations. Our edit distance ILP formulation carefully decomposes the problem according to variant locations, which allows it to scale to support all of chromosome 22's variants from the 1000 Genome Project. Our experiments also demonstrate a significant reduction in the number of variants. For example, for moderately long reads, e.g., α = 1000, over 75% of the variants can be removed while preserving read mappability with edit distance at most one.


Asunto(s)
Algoritmos , Haplotipos , Humanos , Biología Computacional/métodos , Genómica/métodos , Genoma Humano , Programas Informáticos , Variación Genética , Análisis de Secuencia de ADN/métodos
18.
Hum Genomics ; 18(1): 79, 2024 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-39010135

RESUMEN

The analysis of genomic variations in offspring after implantation has been infrequently studied. In this study, we aim to investigate the extent of de novo mutations in humans from developing fetus to birth. Using high-depth whole-genome sequencing, 443 parent-offspring trios were studied to compare the results of de novo mutations (DNMs) between different groups. The focus was on fetuses and newborns, with DNA samples obtained from the families' blood and the aspirated embryonic tissues subjected to deep sequencing. It was observed that the average number of total DNMs in the newborns group was 56.26 (54.17-58.35), which appeared to be lower than that the multifetal reduction group, which was 76.05 (69.70-82.40) (F = 2.42, P = 0.12). However, after adjusting for parental age and maternal pre-pregnancy body mass index (BMI), significant differences were found between the two groups. The analysis was further divided into single nucleotide variants (SNVs) and insertion/deletion of a small number of bases (indels), and it was discovered that the average number of de novo SNVs associated with the multifetal reduction group and the newborn group was 49.89 (45.59-54.20) and 51.09 (49.22-52.96), respectively. No significant differences were noted between the groups (F = 1.01, P = 0.32). However, a significant difference was observed for de novo indels, with a higher average number found in the multifetal reduction group compared to the newborn group (F = 194.17, P < 0.001). The average number of de novo indels among the multifetal reduction group and the newborn group was 26.26 (23.27-29.05) and 5.17 (4.82-5.52), respectively. To conclude, it has been observed that the quantity of de novo indels in the newborns experiences a significant decrease when compared to that in the aspirated embryonic tissues (7-9 weeks). This phenomenon is evident across all genomic regions, highlighting the adverse effects of de novo indels on the fetus and emphasizing the significance of embryonic implantation and intrauterine growth in human genetic selection mechanisms.


Asunto(s)
Feto , Humanos , Femenino , Embarazo , Recién Nacido , Masculino , Adulto , Polimorfismo de Nucleótido Simple/genética , Implantación del Embrión/genética , Genoma Humano/genética , Mutación INDEL/genética , Genómica , Secuenciación Completa del Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación/genética , Desarrollo Fetal/genética
19.
Genes (Basel) ; 15(7)2024 Jul 16.
Artículo en Inglés | MEDLINE | ID: mdl-39062704

RESUMEN

The identification of structural variants (SVs) in genomic data represents an ongoing challenge because of difficulties in reliable SV calling leading to reduced sensitivity and specificity. We prepared high-quality DNA from 9 parent-child trios, who had previously undergone short-read whole-genome sequencing (Illumina platform) as part of the Genomics England 100,000 Genomes Project. We reanalysed the genomes using both Bionano optical genome mapping (OGM; 8 probands and one trio) and Nanopore long-read sequencing (Oxford Nanopore Technologies [ONT] platform; all samples). To establish a "truth" dataset, we asked whether rare proband SV calls (n = 234) made by the Bionano Access (version 1.6.1)/Solve software (version 3.6.1_11162020) could be verified by individual visualisation using the Integrative Genomics Viewer with either or both of the Illumina and ONT raw sequence. Of these, 222 calls were verified, indicating that Bionano OGM calls have high precision (positive predictive value 95%). We then asked what proportion of the 222 true Bionano SVs had been identified by SV callers in the other two datasets. In the Illumina dataset, sensitivity varied according to variant type, being high for deletions (115/134; 86%) but poor for insertions (13/58; 22%). In the ONT dataset, sensitivity was generally poor using the original Sniffles variant caller (48% overall) but improved substantially with use of Sniffles2 (36/40; 90% and 17/23; 74% for deletions and insertions, respectively). In summary, we show that the precision of OGM is very high. In addition, when applying the Sniffles2 caller, the sensitivity of SV calling using ONT long-read sequence data outperforms Illumina sequencing for most SV types.


Asunto(s)
Benchmarking , Secuenciación de Nanoporos , Secuenciación Completa del Genoma , Humanos , Secuenciación Completa del Genoma/métodos , Secuenciación Completa del Genoma/normas , Secuenciación de Nanoporos/métodos , Benchmarking/métodos , Variación Estructural del Genoma/genética , Mapeo Cromosómico/métodos , Genoma Humano/genética , Genómica/métodos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Femenino , Nanoporos , Masculino , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/normas
20.
Med Sci (Paris) ; 40(6-7): 560-561, 2024.
Artículo en Francés | MEDLINE | ID: mdl-38986103
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA