Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 386
Filtrar
Más filtros

Bases de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 183(1): 197-210.e32, 2020 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-33007263

RESUMEN

Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.


Asunto(s)
Variación Estructural del Genoma/genética , Genómica/métodos , Neoplasias/genética , Inversión Cromosómica/genética , Cromotripsis , Variaciones en el Número de Copia de ADN/genética , Reordenamiento Génico/genética , Genoma Humano/genética , Humanos , Mutación/genética , Secuenciación Completa del Genoma/métodos
2.
Cell ; 176(6): 1310-1324.e10, 2019 03 07.
Artículo en Inglés | MEDLINE | ID: mdl-30827684

RESUMEN

DNA rearrangements resulting in human genome structural variants (SVs) are caused by diverse mutational mechanisms. We used long- and short-read sequencing technologies to investigate end products of de novo chromosome 17p11.2 rearrangements and query the molecular mechanisms underlying both recurrent and non-recurrent events. Evidence for an increased rate of clustered single-nucleotide variant (SNV) mutation in cis with non-recurrent rearrangements was found. Indel and SNV formation are associated with both copy-number gains and losses of 17p11.2, occur up to ∼1 Mb away from the breakpoint junctions, and favor C > G transversion substitutions; results suggest that single-stranded DNA is formed during the genesis of the SV and provide compelling support for a microhomology-mediated break-induced replication (MMBIR) mechanism for SV formation. Our data show an additional mutational burden of MMBIR consisting of hypermutation confined to the locus and manifesting as SNVs and indels predominantly within genes.


Asunto(s)
Cromosomas Humanos Par 17 , Mutación , Anomalías Múltiples/genética , Puntos de Rotura del Cromosoma , Trastornos de los Cromosomas/genética , Duplicación Cromosómica/genética , Variaciones en el Número de Copia de ADN , Reparación del ADN/genética , Replicación del ADN , Reordenamiento Génico , Genoma Humano , Variación Estructural del Genoma , Humanos , Mutación INDEL , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Recombinación Genética , Análisis de Secuencia de ADN/métodos , Síndrome de Smith-Magenis/genética
3.
Trends Genet ; 40(7): 601-612, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38777691

RESUMEN

With broad genetic diversity and as a source of key agronomic traits, wild grape species (Vitis spp.) are crucial to enhance viticulture's climatic resilience and sustainability. This review discusses how recent breakthroughs in the genome assembly and analysis of wild grape species have led to discoveries on grape evolution, from wild species' adaptation to environmental stress to grape domestication. We detail how diploid chromosome-scale genomes from wild Vitis spp. have enabled the identification of candidate disease-resistance and flower sex determination genes and the creation of the first Vitis graph-based pangenome. Finally, we explore how wild grape genomics can impact grape research and viticulture, including aspects such as data sharing, the development of functional genomics tools, and the acceleration of genetic improvement.


Asunto(s)
Genoma de Planta , Genómica , Vitis , Vitis/genética , Genómica/métodos , Genoma de Planta/genética , Variación Genética , Resistencia a la Enfermedad/genética , Domesticación , Evolución Molecular
4.
Am J Hum Genet ; 110(1): 161-165, 2023 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-36450278

RESUMEN

The first release of UK Biobank whole-genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole-genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and Tabix for VCF indexing. We used the pipeline to phase 406 million single-nucleotide variants on chromosomes 1-22 and X at a cost of £2,309. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.


Asunto(s)
Bancos de Muestras Biológicas , Genoma , Humanos , Perros , Animales , Genotipo , Haplotipos/genética , Polimorfismo de Nucleótido Simple/genética , Reino Unido , Algoritmos , Análisis de Secuencia de ADN/métodos
5.
Brief Bioinform ; 25(1)2023 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-38189536

RESUMEN

Accurate subgenome phasing is crucial for understanding the origin, evolution and adaptive potential of polyploid genomes. SubPhaser and WGDI software are two common methodologies for subgenome phasing in allopolyploids, particularly in scenarios lacking known diploid progenitors. Triggered by a recent debate over the subgenomic origins of the cultivated octoploid strawberry, we examined four well-documented complex allopolyploidy cases as benchmarks, to evaluate and compare the accuracy of the two software. Our analysis demonstrates that the subgenomic structure phased by both software is in line with prior research, effectively tracing complex allopolyploid evolutionary trajectories despite the limitations of each software. Furthermore, using these validated methodologies, we revisited the controversial issue regarding the progenitors of the octoploid strawberry. The results of both methodologies reaffirm Fragaria vesca and Fragaria iinumae as progenitors of the octoploid strawberry. Finally, we propose recommendations for enhancing the accuracy of subgenome phasing in future studies, recognizing the potential of integrated tools for advanced complex allopolyploidy research and offering a new roadmap for robust subgenome-based phylogenetic analysis.


Asunto(s)
Benchmarking , Fragaria , Filogenia , Fragaria/genética , Poliploidía , Programas Informáticos
6.
Mol Cell ; 67(4): 659-672.e12, 2017 Aug 17.
Artículo en Inglés | MEDLINE | ID: mdl-28803778

RESUMEN

The endogenous circadian clock synchronizes with environmental time by appropriately resetting its phase in response to external cues. Of note, some resetting stimuli induce attenuated oscillations of clock output, which has been observed at the population-level in several organisms and in studies of individual humans. To investigate what is happening in individual cellular clocks, we studied the unicellular cyanobacterium S. elongatus. By measuring its phase-resetting responses to temperature changes, we found that population-level arrhythmicity occurs when certain perturbations cause stochastic phases of oscillations in individual cells. Combining modeling with experiments, we related stochastic phasing to the dynamical structure of the cyanobacterial clock as an oscillator and explored the physiological relevance of the oscillator structure for accurately timed rhythmicity in changing environmental conditions. Our findings and approach can be applied to other biological oscillators.


Asunto(s)
Proteínas Bacterianas/metabolismo , Relojes Circadianos , Péptidos y Proteínas de Señalización del Ritmo Circadiano/metabolismo , Ritmo Circadiano , Modelos Biológicos , Synechococcus/metabolismo , Temperatura , Adaptación Fisiológica , Proteínas Bacterianas/genética , Péptidos y Proteínas de Señalización del Ritmo Circadiano/genética , Simulación por Computador , Microscopía Fluorescente , Transducción de Señal , Análisis de la Célula Individual , Procesos Estocásticos , Synechococcus/genética , Factores de Tiempo , Imagen de Lapso de Tiempo
7.
Ann Hum Genet ; 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38690755

RESUMEN

INTRODUCTION: Long-read whole genome sequencing like Oxford Nanopore Technology, is increasingly being introduced in clinical settings. With its ability to simultaneously call sequence variation and DNA modifications including 5-methylcytosine, nanopore is a promising technology to improve diagnostics of imprinting disorders. METHODS: Currently, no tools to analyze DNA methylation patterns at known clinically relevant imprinted regions are available. Here we present NanoImprint, which generates an easily interpretable report, based on long-read nanopore sequencing, to use for identifying clinical relevant abnormalities in methylation levels at 14 imprinted regions and diagnosis of common imprinting disorders. RESULTS AND CONCLUSION: NanoImprint outputs a summarizing table and visualization plots displays methylation frequency (%) and chromosomal positions for all regions, with phased data color-coded for the two alleles. We demonstrate the utility of NanoImprint using three imprinting disorder samples from patients with Beckwith-Wiedemann syndrome (BWS), Angelman syndrome (AS) and Prader-Willi syndrome (PWS). NanoImprint script is available from https://github.com/carolinehey/NanoImprint.

8.
Am J Hum Genet ; 108(10): 1880-1890, 2021 10 07.
Artículo en Inglés | MEDLINE | ID: mdl-34478634

RESUMEN

Haplotype phasing is the estimation of haplotypes from genotype data. We present a fast, accurate, and memory-efficient haplotype phasing method that scales to large-scale SNP array and sequence data. The method uses marker windowing and composite reference haplotypes to reduce memory usage and computation time. It incorporates a progressive phasing algorithm that identifies confidently phased heterozygotes in each iteration and fixes the phase of these heterozygotes in subsequent iterations. For data with many low-frequency variants, such as whole-genome sequence data, the method employs a two-stage phasing algorithm that phases high-frequency markers via progressive phasing in the first stage and phases low-frequency markers via genotype imputation in the second stage. This haplotype phasing method is implemented in the open-source Beagle 5.2 software package. We compare Beagle 5.2 and SHAPEIT 4.2.1 by using expanding subsets of 485,301 UK Biobank samples and 38,387 TOPMed samples. Both methods have very similar accuracy and computation time for UK Biobank SNP array data. However, for TOPMed sequence data, Beagle is more than 20 times faster than SHAPEIT, achieves similar accuracy, and scales to larger sample sizes.


Asunto(s)
Asma/genética , Fibrilación Atrial/genética , Interpretación Estadística de Datos , Genoma Humano , Haplotipos , Polimorfismo de Nucleótido Simple , Programas Informáticos , Algoritmos , Femenino , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino
9.
Clin Genet ; 2024 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-38719617

RESUMEN

Genetic maps are fundamental resources for linkage and association studies. A fine-scale genetic map can be constructed by inferring historical recombination events from the genome-wide structure of linkage disequilibrium-a non-random association of alleles among loci-by using population-scale sequencing data. We constructed a fine-scale genetic map and identified recombination hotspots from 10 092 551 bi-allelic high-quality autosomal markers segregating among 150 unrelated Japanese individuals whose genotypes were determined by high-coverage (30×) whole-genome sequencing, and the genotype quality was carefully controlled by using their parents' and offspring's genotypes. The pedigree information was also utilized for haplotype phasing. The resulting genome-wide recombination rate profiles were concordant with those of the worldwide population on a broad scale, and the resolution was much improved. We identified 9487 recombination hotspots and confirmed the enrichment of previously known motifs in the hotspots. Moreover, we demonstrated that the Japanese genetic map improved the haplotype phasing and genotype imputation accuracy for the Japanese population. The construction of a population-specific genetic map will help make genetics research more accurate.

10.
Lab Invest ; 103(8): 100160, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37088464

RESUMEN

Short-read next-generation sequencing has revolutionized our ability to identify variants underlying inherited diseases; however, it does not allow the phasing of variants to clarify their diagnostic interpretation. The advent of widespread, increasingly accurate long-read sequencing has opened up new applications not currently available through short-read next-generation sequencing. One such use is the ability to phase variants to clarify their diagnostic interpretation and to investigate the increasingly prevalent role of cis-acting variants in the pathogenesis of the inherited disease, so-called complex alleles. Complex alleles are becoming an increasingly prevalent part of the study of genes associated with inherited diseases, for example, in ABCA4-related diseases. We sought to establish a cost-effective method to phase contiguous segments of the 130-kb ABCA4 locus by long-read sequencing of overlapping amplification products. Using the comprehensively characterized CEPH sample, NA12878, we verified the accuracy and robustness of our assay. However, in-field assessment of its utility using clinical test cases was hampered by the paucity and distribution of identified variants and by PCR chimerism, particularly where the number of PCR cycles was high. Despite this, we were able to construct robust phase blocks of up to 94.9 kb, representing 73% of the ABCA4 locus. We conclude that, although haplotype analysis of variants located within discrete amplification products was robust and informative, the stitching together of larger phase blocks using overlapping single-molecule reads remained practically challenging.


Asunto(s)
Secuenciación de Nanoporos , Haplotipos/genética , Alelos , Reacción en Cadena de la Polimerasa , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos
11.
J Synchrotron Radiat ; 30(Pt 5): 885-894, 2023 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-37526994

RESUMEN

In X-ray macromolecular crystallography (MX), single-wavelength anomalous dispersion (SAD) and multi-wavelength anomalous dispersion (MAD) techniques are commonly used for obtaining experimental phases. For an MX synchrotron beamline to support SAD and MAD techniques it is a prerequisite to have a reliable, fast and well automated energy scan routine. This work reports on a continuous energy scan procedure newly implemented at the BioMAX MX beamline at MAX IV Laboratory. The continuous energy scan is fully automated, capable of measuring accurate fluorescence counts over the absorption edge of interest while minimizing the sample exposure to X-rays, and is about a factor of five faster compared with a conventional step scan previously operational at BioMAX. The implementation of the continuous energy scan facilitates the prompt access to the anomalous scattering data, required for the SAD and MAD experiments.

12.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33236761

RESUMEN

Haplotype phasing is a critical step for many genetic applications but incorrect estimates of phase can negatively impact downstream analyses. One proposed strategy to improve phasing accuracy is to combine multiple independent phasing estimates to overcome the limitations of any individual estimate. However, such a strategy is yet to be thoroughly explored. This study provides a comprehensive evaluation of consensus strategies for haplotype phasing. We explore the performance of different consensus paradigms, and the effect of specific constituent tools, across several datasets with different characteristics and their impact on the downstream task of genotype imputation. Based on the outputs of existing phasing tools, we explore two different strategies to construct haplotype consensus estimators: voting across outputs from multiple phasing tools and multiple outputs of a single non-deterministic tool. We find that the consensus approach from multiple tools reduces SE by an average of 10% compared to any constituent tool when applied to European populations and has the highest accuracy regardless of population ethnicity, sample size, variant density or variant frequency. Furthermore, the consensus estimator improves the accuracy of the downstream task of genotype imputation carried out by the widely used Minimac3, pbwt and BEAGLE5 tools. Our results provide guidance on how to produce the most accurate phasing estimates and the trade-offs that a consensus approach may have. Our implementation of consensus haplotype phasing, consHap, is available freely at https://github.com/ziadbkh/consHap. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.


Asunto(s)
Algoritmos , Bases de Datos de Ácidos Nucleicos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Haplotipos , Humanos
13.
Brief Bioinform ; 22(4)2021 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-33285565

RESUMEN

The identification of rare haplotypes may greatly expand our knowledge in the genetic architecture of both complex and monogenic traits. To this aim, we developed PERHAPS (Paired-End short Reads-based HAPlotyping from next-generation Sequencing data), a new and simple approach to directly call haplotypes from short-read, paired-end Next Generation Sequencing (NGS) data. To benchmark this method, we considered the APOE classic polymorphism (*1/*2/*3/*4), since it represents one of the best examples of functional polymorphism arising from the haplotype combination of two Single Nucleotide Polymorphisms (SNPs). We leveraged the big Whole Exome Sequencing (WES) and SNP-array data obtained from the multi-ethnic UK BioBank (UKBB, N=48,855). By applying PERHAPS, based on piecing together the paired-end reads according to their FASTQ-labels, we extracted the haplotype data, along with their frequencies and the individual diplotype. Concordance rates between WES directly called diplotypes and the ones generated through statistical pre-phasing and imputation of SNP-array data are extremely high (>99%), either when stratifying the sample by SNP-array genotyping batch or self-reported ethnic group. Hardy-Weinberg Equilibrium tests and the comparison of obtained haplotype frequencies with the ones available from the 1000 Genome Project further supported the reliability of PERHAPS. Notably, we were able to determine the existence of the rare APOE*1 haplotype in two unrelated African subjects from UKBB, supporting its presence at appreciable frequency (approximatively 0.5%) in the African Yoruba population. Despite acknowledging some technical shortcomings, PERHAPS represents a novel and simple approach that will partly overcome the limitations in direct haplotype calling from short read-based sequencing.


Asunto(s)
Algoritmos , Genoma Humano , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento , Polimorfismo de Nucleótido Simple , Apolipoproteínas E/genética , Proyecto Genoma Humano , Humanos
14.
Mol Phylogenet Evol ; 184: 107777, 2023 07.
Artículo en Inglés | MEDLINE | ID: mdl-36990304

RESUMEN

Plant-feeding beetle species are diverse and often individually highly variable. Accurate classifications can be difficult to establish yet are essential for study of evolutionary patterns and processes. Molecular data are key to further characterizing morphologically difficult groups and defining genus and species boundaries. Monochamus Dejean species are ecologically and economically significant, and in coniferous forests they vector the nematode that causes Pine Wilt Disease. This study uses nuclear and mitochondrial genes to test the monophyly and relationships of Monochamus and applies coalescent methods to further delimit the conifer-feeding species. Monochamus has also included approximately 120 Old World species associated with diverse angiosperm tree species. We sample from these additional morphologically diverse species to determine their placement in the Lamiini. Through supermatrix and coalescent methods, the higher-level relationships of Monochamus show that conifer-feeders are a monophyletic group that includes the type species and has split into Nearctic and Palearctic clades. Molecular dating indicates a single dispersal of conifer-feeders to North America over the second Bering Land Bridge circa 5.3 Ma. All other Monochamus sampled fall in different parts of the Lamiini tree. Small-bodied angiosperm-feeding Monochamus group with the monotypic genus Microgoes Casey. The African Monochamus subgenera sampled are distantly related to the conifer-feeding clade. The multispecies coalescent delimitation methods BPP and STACEY delimit 17 conifer-feeding Monochamus species for a total of 18 species, and supports the retention of all current species. An interrogation with nuclear gene allele phasing reveals that unphased data can be unreliable for accurate delimitations and divergence times. The delimited species are discussed with integrative evidence, highlighting real-world challenges in recognizing the completion of speciation.


Asunto(s)
Escarabajos , Nematodos , Pinus , Animales , Filogenia , América del Norte , Árboles
15.
J Hered ; 114(5): 513-520, 2023 08 23.
Artículo en Inglés | MEDLINE | ID: mdl-36869788

RESUMEN

Genomic resources across squamate reptiles (lizards and snakes) have lagged behind other vertebrate systems and high-quality reference genomes remain scarce. Of the 23 chromosome-scale reference genomes across the order, only 12 of the ~60 squamate families are represented. Within geckos (infraorder Gekkota), a species-rich clade of lizards, chromosome-level genomes are exceptionally sparse representing only two of the seven extant families. Using the latest advances in genome sequencing and assembly methods, we generated one of the highest-quality squamate genomes to date for the leopard gecko, Eublepharis macularius (Eublepharidae). We compared this assembly to the previous, short-read only, E. macularius reference genome published in 2016 and examined potential factors within the assembly influencing contiguity of genome assemblies using PacBio HiFi data. Briefly, the read N50 of the PacBio HiFi reads generated for this study was equal to the contig N50 of the previous E. macularius reference genome at 20.4 kilobases. The HiFi reads were assembled into a total of 132 contigs, which was further scaffolded using HiC data into 75 total sequences representing all 19 chromosomes. We identified 9 of the 19 chromosomal scaffolds were assembled as a near-single contig, whereas the other 10 chromosomes were each scaffolded together from multiple contigs. We qualitatively identified that the percent repeat content within a chromosome broadly affects its assembly contiguity prior to scaffolding. This genome assembly signifies a new age for squamate genomics where high-quality reference genomes rivaling some of the best vertebrate genome assemblies can be generated for a fraction of previous cost estimates. This new E. macularius reference assembly is available on NCBI at JAOPLA010000000.


Asunto(s)
Genoma , Lagartos , Humanos , Animales , Genómica/métodos , Mapeo Cromosómico/métodos , Cromosomas , Lagartos/genética
16.
Proc Natl Acad Sci U S A ; 117(38): 23408-23417, 2020 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-32900942

RESUMEN

The Younger Dryas (YD), arguably the most widely studied millennial-scale extreme climate event, was characterized by diverse hydroclimate shifts globally and severe cooling at high northern latitudes that abruptly punctuated the warming trend from the last glacial to the present interglacial. To date, a precise understanding of its trigger, propagation, and termination remains elusive. Here, we present speleothem oxygen-isotope data that, in concert with other proxy records, allow us to quantify the timing of the YD onset and termination at an unprecedented subcentennial temporal precision across the North Atlantic, Asian Monsoon-Westerlies, and South American Monsoon regions. Our analysis suggests that the onsets of YD in the North Atlantic (12,870 ± 30 B.P.) and the Asian Monsoon-Westerlies region are essentially synchronous within a few decades and lead the onset in Antarctica, implying a north-to-south climate signal propagation via both atmospheric (decadal-time scale) and oceanic (centennial-time scale) processes, similar to the Dansgaard-Oeschger events during the last glacial period. In contrast, the YD termination may have started first in Antarctica at ∼11,900 B.P., or perhaps even earlier in the western tropical Pacific, followed by the North Atlantic between ∼11,700 ± 40 and 11,610 ± 40 B.P. These observations suggest that the initial YD termination might have originated in the Southern Hemisphere and/or the tropical Pacific, indicating a Southern Hemisphere/tropics to North Atlantic-Asian Monsoon-Westerlies directionality of climatic recovery.

17.
Genomics ; 114(2): 110277, 2022 03.
Artículo en Inglés | MEDLINE | ID: mdl-35104609

RESUMEN

Sexual reproduction is a diverse and widespread process. In gonochoristic species, the differentiation of sexes occurs through diverse mechanisms, influenced by environmental and genetic factors. In most vertebrates, a master-switch gene is responsible for triggering a sex determination network. However, only a few genes have acquired master-switch functions, and this process is associated with the evolution of sex-chromosomes, which have a significant influence in evolution. Additionally, their highly repetitive regions impose challenges for high-quality sequencing, even using high-throughput, state-of-the-art techniques. Here, we review the mechanisms involved in sex determination and their role in the evolution of species, particularly vertebrates, focusing on sex chromosomes and the challenges involved in sequencing these genomic elements. We also address the improvements provided by the growth of sequencing projects, by generating a massive number of near-gapless, telomere-to-telomere, chromosome-level, phased assemblies, increasing the number and quality of sex-chromosome sequences available for further studies.


Asunto(s)
Cromosomas Sexuales , Telómero , Animales , Secuencias Repetitivas de Ácidos Nucleicos , Cromosomas Sexuales/genética , Telómero/genética , Vertebrados/genética
18.
Int J Mol Sci ; 24(8)2023 Apr 08.
Artículo en Inglés | MEDLINE | ID: mdl-37108096

RESUMEN

A variety of plant species found in nature contain agrobacterial T-DNAs in their genomes which they transmit in a series of sexual generations. Such T-DNAs are called cellular T-DNAs (cT-DNAs). cT-DNAs have been discovered in dozens of plant genera, and are suggested to be used in phylogenetic studies, since they are well-defined and unrelated to other plant sequences. Their integration into a particular chromosomal site indicates a founder event and a clear start of a new clade. cT-DNA inserts do not disseminate in the genome after insertion. They can be large and old enough to generate a range of variants, thereby allowing the construction of detailed trees. Unusual cT-DNAs (containing the rolB/C-like gene) were found in our previous study in the genome data of two Vaccinium L. species. Here, we present a deeper study of these sequences in Vaccinium L. Molecular-genetic and bioinformatics methods were applied for sequencing, assembly, and analysis of the rolB/C-like gene. The rolB/C-like gene was discovered in 26 new Vaccinium species and Agapetes serpens (Wight) Sleumer. Most samples were found to contain full-size genes. It allowed us to develop approaches for the phasing of cT-DNA alleles and reconstruct a Vaccinium phylogenetic relationship. Intra- and interspecific polymorphism found in cT-DNA makes it possible to use it for phylogenetic and phylogeographic studies of the Vaccinium genus.


Asunto(s)
Vaccinium , Filogenia , Transgenes , Plantas , Biodiversidad
19.
BMC Bioinformatics ; 23(1): 502, 2022 Nov 23.
Artículo en Inglés | MEDLINE | ID: mdl-36424541

RESUMEN

As genotype databases increase in size, so too do the number of detectable segments of identity by descent (IBD): segments of the genome where two individuals share an identical copy of one of their two parental haplotypes, due to shared ancestry. We show that given a large enough genotype database, these segments of IBD collectively overlap entire chromosomes, including instances of IBD that span multiple chromosomes, and can be used to accurately separate the alleles inherited from each parent across the entire genome. The resulting phase is not an improvement over state-of-the-art local phasing methods, but provides accurate long-range phasing that indicates which of two haplotypes in different regions of the genome, including different chromosomes, was inherited from the same parent. We are able to separate the DNA inherited from each parent completely, across the entire genome, with 98% median accuracy in a test set of 30,000 individuals. We estimate the IBD data requirements for accurate genome-wide phasing, and we propose a method for estimating confidence in the resulting phase. We show that our methods do not require the genotypes of close family, and that they are robust to genotype errors and missing data. In fact, our method can impute missing data accurately and correct genotype errors.


Asunto(s)
Genotipo , Humanos , Haplotipos , Alelos , Bases de Datos Factuales
20.
BMC Bioinformatics ; 23(1): 465, 2022 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-36344913

RESUMEN

BACKGROUND: Whole genome sequencing using the long-read Oxford Nanopore Technologies (ONT) MinION sequencer provides a cost-effective option for structural variant (SV) detection in clinical applications. Despite the advantage of using long reads, however, accurate SV calling and phasing are still challenging. RESULTS: We introduce Duet, an SV detection tool optimized for SV calling and phasing using ONT data. The tool uses novel features integrated from both SV signatures and single-nucleotide polymorphism signatures, which can accurately distinguish SV haplotype from a false signal. Duet was benchmarked against state-of-the-art tools on multiple ONT sequencing datasets of sequencing coverage ranging from 8× to 40×. At low sequencing coverage of 8×, Duet performs better than all other tools in SV calling, SV genotyping and SV phasing. When the sequencing coverage is higher (20× to 40×), the F1-score for SV phasing is further improved in comparison to the performance of other tools, while its performance of SV genotyping and SV calling remains higher than other tools. CONCLUSION: Duet can perform accurate SV calling, SV genotyping and SV phasing using low-coverage ONT data, making it very useful for low-coverage genomes. It has great performance when scaled to high-coverage genomes, which is adaptable to various clinical applications. Duet is open source and is available at https://github.com/yekaizhou/duet .


Asunto(s)
Secuenciación de Nanoporos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Secuenciación de Nucleótidos de Alto Rendimiento , Secuenciación Completa del Genoma
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA