Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Trends Genet ; 40(7): 601-612, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38777691

RESUMO

With broad genetic diversity and as a source of key agronomic traits, wild grape species (Vitis spp.) are crucial to enhance viticulture's climatic resilience and sustainability. This review discusses how recent breakthroughs in the genome assembly and analysis of wild grape species have led to discoveries on grape evolution, from wild species' adaptation to environmental stress to grape domestication. We detail how diploid chromosome-scale genomes from wild Vitis spp. have enabled the identification of candidate disease-resistance and flower sex determination genes and the creation of the first Vitis graph-based pangenome. Finally, we explore how wild grape genomics can impact grape research and viticulture, including aspects such as data sharing, the development of functional genomics tools, and the acceleration of genetic improvement.


Assuntos
Genoma de Planta , Genômica , Vitis , Vitis/genética , Genômica/métodos , Genoma de Planta/genética , Variação Genética , Resistência à Doença/genética , Domesticação , Evolução Molecular
2.
Am J Hum Genet ; 110(1): 161-165, 2023 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-36450278

RESUMO

The first release of UK Biobank whole-genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole-genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and Tabix for VCF indexing. We used the pipeline to phase 406 million single-nucleotide variants on chromosomes 1-22 and X at a cost of £2,309. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.


Assuntos
Bancos de Espécimes Biológicos , Genoma , Humanos , Cães , Animais , Genótipo , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Reino Unido , Algoritmos , Análise de Sequência de DNA/métodos
3.
Am J Hum Genet ; 108(10): 1880-1890, 2021 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-34478634

RESUMO

Haplotype phasing is the estimation of haplotypes from genotype data. We present a fast, accurate, and memory-efficient haplotype phasing method that scales to large-scale SNP array and sequence data. The method uses marker windowing and composite reference haplotypes to reduce memory usage and computation time. It incorporates a progressive phasing algorithm that identifies confidently phased heterozygotes in each iteration and fixes the phase of these heterozygotes in subsequent iterations. For data with many low-frequency variants, such as whole-genome sequence data, the method employs a two-stage phasing algorithm that phases high-frequency markers via progressive phasing in the first stage and phases low-frequency markers via genotype imputation in the second stage. This haplotype phasing method is implemented in the open-source Beagle 5.2 software package. We compare Beagle 5.2 and SHAPEIT 4.2.1 by using expanding subsets of 485,301 UK Biobank samples and 38,387 TOPMed samples. Both methods have very similar accuracy and computation time for UK Biobank SNP array data. However, for TOPMed sequence data, Beagle is more than 20 times faster than SHAPEIT, achieves similar accuracy, and scales to larger sample sizes.


Assuntos
Asma/genética , Fibrilação Atrial/genética , Interpretação Estatística de Dados , Genoma Humano , Haplótipos , Polimorfismo de Nucleotídeo Único , Software , Algoritmos , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino
4.
Clin Genet ; 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38719617

RESUMO

Genetic maps are fundamental resources for linkage and association studies. A fine-scale genetic map can be constructed by inferring historical recombination events from the genome-wide structure of linkage disequilibrium-a non-random association of alleles among loci-by using population-scale sequencing data. We constructed a fine-scale genetic map and identified recombination hotspots from 10 092 551 bi-allelic high-quality autosomal markers segregating among 150 unrelated Japanese individuals whose genotypes were determined by high-coverage (30×) whole-genome sequencing, and the genotype quality was carefully controlled by using their parents' and offspring's genotypes. The pedigree information was also utilized for haplotype phasing. The resulting genome-wide recombination rate profiles were concordant with those of the worldwide population on a broad scale, and the resolution was much improved. We identified 9487 recombination hotspots and confirmed the enrichment of previously known motifs in the hotspots. Moreover, we demonstrated that the Japanese genetic map improved the haplotype phasing and genotype imputation accuracy for the Japanese population. The construction of a population-specific genetic map will help make genetics research more accurate.

5.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33236761

RESUMO

Haplotype phasing is a critical step for many genetic applications but incorrect estimates of phase can negatively impact downstream analyses. One proposed strategy to improve phasing accuracy is to combine multiple independent phasing estimates to overcome the limitations of any individual estimate. However, such a strategy is yet to be thoroughly explored. This study provides a comprehensive evaluation of consensus strategies for haplotype phasing. We explore the performance of different consensus paradigms, and the effect of specific constituent tools, across several datasets with different characteristics and their impact on the downstream task of genotype imputation. Based on the outputs of existing phasing tools, we explore two different strategies to construct haplotype consensus estimators: voting across outputs from multiple phasing tools and multiple outputs of a single non-deterministic tool. We find that the consensus approach from multiple tools reduces SE by an average of 10% compared to any constituent tool when applied to European populations and has the highest accuracy regardless of population ethnicity, sample size, variant density or variant frequency. Furthermore, the consensus estimator improves the accuracy of the downstream task of genotype imputation carried out by the widely used Minimac3, pbwt and BEAGLE5 tools. Our results provide guidance on how to produce the most accurate phasing estimates and the trade-offs that a consensus approach may have. Our implementation of consensus haplotype phasing, consHap, is available freely at https://github.com/ziadbkh/consHap. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.


Assuntos
Algoritmos , Bases de Dados de Ácidos Nucleicos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Haplótipos , Humanos
6.
Genomics ; 114(2): 110277, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35104609

RESUMO

Sexual reproduction is a diverse and widespread process. In gonochoristic species, the differentiation of sexes occurs through diverse mechanisms, influenced by environmental and genetic factors. In most vertebrates, a master-switch gene is responsible for triggering a sex determination network. However, only a few genes have acquired master-switch functions, and this process is associated with the evolution of sex-chromosomes, which have a significant influence in evolution. Additionally, their highly repetitive regions impose challenges for high-quality sequencing, even using high-throughput, state-of-the-art techniques. Here, we review the mechanisms involved in sex determination and their role in the evolution of species, particularly vertebrates, focusing on sex chromosomes and the challenges involved in sequencing these genomic elements. We also address the improvements provided by the growth of sequencing projects, by generating a massive number of near-gapless, telomere-to-telomere, chromosome-level, phased assemblies, increasing the number and quality of sex-chromosome sequences available for further studies.


Assuntos
Cromossomos Sexuais , Telômero , Animais , Sequências Repetitivas de Ácido Nucleico , Cromossomos Sexuais/genética , Telômero/genética , Vertebrados/genética
7.
Plant J ; 108(6): 1830-1848, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34661327

RESUMO

Cassava (Manihot esculenta Crantz, 2n = 36) is a global food security crop. It has a highly heterozygous genome, high genetic load, and genotype-dependent asynchronous flowering. It is typically propagated by stem cuttings and any genetic variation between haplotypes, including large structural variations, is preserved by such clonal propagation. Traditional genome assembly approaches generate a collapsed haplotype representation of the genome. In highly heterozygous plants, this results in artifacts and an oversimplification of heterozygous regions. We used a combination of Pacific Biosciences (PacBio), Illumina, and Hi-C to resolve each haplotype of the genome of a farmer-preferred cassava line, TME7 (Oko-iyawo). PacBio reads were assembled using the FALCON suite. Phase switch errors were corrected using FALCON-Phase and Hi-C read data. The ultralong-range information from Hi-C sequencing was also used for scaffolding. Comparison of the two phases revealed >5000 large haplotype-specific structural variants affecting over 8 Mb, including insertions and deletions spanning thousands of base pairs. The potential of these variants to affect allele-specific expression was further explored. RNA-sequencing data from 11 different tissue types were mapped against the scaffolded haploid assembly and gene expression data are incorporated into our existing easy-to-use web-based interface to facilitate use by the broader plant science community. These two assemblies provide an excellent means to study the effects of heterozygosity, haplotype-specific structural variation, gene hemizygosity, and allele-specific gene expression contributing to important agricultural traits and further our understanding of the genetics and domestication of cassava.


Assuntos
Genoma de Planta , Haplótipos , Manihot/genética , África , Elementos de DNA Transponíveis , Diploide , Regulação da Expressão Gênica de Plantas , Tamanho do Genoma , Heterozigoto , Anotação de Sequência Molecular , Sintenia
8.
Int J Mol Sci ; 21(23)2020 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-33271988

RESUMO

The reconstruction of individual haplotypes can facilitate the interpretation of disease risks; however, high costs and technical challenges still hinder their assessment in clinical settings. Second-generation sequencing is the gold standard for variant discovery but, due to the production of short reads covering small genomic regions, allows only indirect haplotyping based on statistical methods. In contrast, third-generation methods such as the nanopore sequencing platform developed by Oxford Nanopore Technologies (ONT) generate long reads that can be used for direct haplotyping, with fewer drawbacks. However, robust standards for variant phasing in ONT-based target resequencing efforts are not yet available. In this study, we presented a streamlined proof-of-concept workflow for variant calling and phasing based on ONT data in a clinically relevant 12-kb region of the APOE locus, a hotspot for variants and haplotypes associated with aging-related diseases and longevity. Starting with sequencing data from simple amplicons of the target locus, we demonstrated that ONT data allow for reliable single-nucleotide variant (SNV) calling and phasing from as little as 60 reads, although the recognition of indels is less efficient. Even so, we identified the best combination of ONT read sets (600) and software (BWA/Minimap2 and HapCUT2) that enables full haplotype reconstruction when both SNVs and indels have been identified previously using a highly-accurate sequencing platform. In conclusion, we established a rapid and inexpensive workflow for variant phasing based on ONT long reads. This allowed for the analysis of multiple samples in parallel and can easily be implemented in routine clinical practice, including diagnostic testing.


Assuntos
Testes Genéticos , Genômica , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Apolipoproteínas E/genética , Mapeamento Cromossômico , Tomada de Decisão Clínica , Biologia Computacional/métodos , Gerenciamento Clínico , Amplificação de Genes , Loci Gênicos , Testes Genéticos/métodos , Variação Genética , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Tipagem Molecular/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Software
9.
Mol Ecol ; 28(21): 4737-4754, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31550391

RESUMO

For half a century population genetics studies have put type II restriction endonucleases to work. Now, coupled with massively-parallel, short-read sequencing, the family of RAD protocols that wields these enzymes has generated vast genetic knowledge from the natural world. Here, we describe the first software natively capable of using paired-end sequencing to derive short contigs from de novo RAD data. Stacks version 2 employs a de Bruijn graph assembler to build and connect contigs from forward and reverse reads for each de novo RAD locus, which it then uses as a reference for read alignments. The new architecture allows all the individuals in a metapopulation to be considered at the same time as each RAD locus is processed. This enables a Bayesian genotype caller to provide precise SNPs, and a robust algorithm to phase those SNPs into long haplotypes, generating RAD loci that are 400-800 bp in length. To prove its recall and precision, we tested the software with simulated data and compared reference-aligned and de novo analyses of three empirical data sets. Our study shows that the latest version of Stacks is highly accurate and outperforms other software in assembling and genotyping paired-end de novo data sets.


Assuntos
Genética Populacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Teorema de Bayes , Genótipo , Humanos , Metagenômica/métodos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Software
10.
BMC Genet ; 20(1): 57, 2019 07 17.
Artigo em Inglês | MEDLINE | ID: mdl-31311514

RESUMO

BACKGROUND: Haplotype data contains more information than genotype data and provides possibilities such as imputing low frequency variants, inferring points of recombination, detecting recurrent mutations, mapping linkage disequilibrium (LD), studying selection signatures, estimating IBD probabilities, etc. In addition, haplotype structure is used to assess genetic diversity and expected accuracy in genomic selection programs. Nevertheless, the quality and efficiency of phasing has rarely been a subject of thorough study but was assessed mainly as a by-product in imputation quality studies. Moreover, phasing studies based on data of a poultry population are non-existent. The aim of this study was to evaluate the phasing quality of FImpute and Beagle, two of the most used phasing software. RESULTS: We simulated ten replicated samples of a layer population comprising 888 individuals from a real SNP dataset of 580 k and a pedigree of 12 generations. Chromosomes analyzed were 1, 7 and 20. We measured the percentage of SNPs that were phased equally between true and phased haplotypes (Eqp), proportion of individuals completely correctly phased, number of incorrectly phased SNPs or Breakpoints (Bkp) and the length of inverted haplotype segments. Results were obtained for three different groups of individuals, with no parents or offspring genotyped in the dataset, with only one parent, and with both parents, respectively. The phasing was performed with Beagle (v3.3 and v4.1) and FImpute v2.2 (with and without pedigree). Eqp values ranged from 88 to 100%, with the best results from haplotypes phased with Beagle v4.1 and FImpute with pedigree information and at least one parent genotyped. FImpute haplotypes showed a higher number of Bkp than Beagle. As a consequence, switched haplotype segments were longer for Beagle than for FImpute. CONCLUSION: We concluded that for the dataset applied in this study Beagle v4.1 or FImpute with pedigree information and at least one parent genotyped in the data set were the best alternatives for obtaining high quality phased haplotypes.


Assuntos
Biologia Computacional/métodos , Genética Populacional , Modelos Genéticos , Software , Pontos de Quebra do Cromossomo , Genótipo , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único
11.
J Assist Reprod Genet ; 36(4): 727-739, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30617673

RESUMO

PURPOSE: Pre-implantation genetic diagnosis (PGD) for molecular disorders requires the construction of parental haplotypes. Classically, haplotype resolution ("phasing") is obtained by genotyping multiple polymorphic markers in both parents and at least one additional relative. However, this process is time-consuming, and immediate family members are not always available. The recent availability of massive genomic data for many populations promises to eliminate the needs for developing family-specific assays and for recruiting additional family members. In this study, we aimed to validate population-assisted haplotype phasing for PGD. METHODS: Targeted sequencing of CFTR gene variants and ~ 1700 flanking polymorphic SNPs (± 2 Mb) was performed on 54 individuals from 12 PGD families of (a) Full Ashkenazi (FA; n = 16), (b) mixed Ashkenazi (MA; n = 23 individuals with at least one Ashkenazi and one non-Ashkenazi grandparents), or (c) non-Ashkenazi (NA; n = 15) descent. Heterozygous genotype calls in each individual were phased using various whole genome reference panels and appropriate computational models. All computationally derived haplotype predictions were benchmarked against trio-based phasing. RESULTS: Using the Ashkenazi reference panel, phasing of FA was highly accurate (99.4% ± 0.2% accuracy); phasing of MA was less accurate (95.4% ± 4.5% accuracy); and phasing of NA was predictably low (83.4% ± 6.6% accuracy). Strikingly, for founder mutation carriers, our haplotyping approach facilitated near perfect phasing accuracy (99.9% ± 0.1% and 98.2% ± 2.8% accuracy for W1282X and delF508 carriers, respectively). CONCLUSIONS: Our results demonstrate the feasibility of replacing classical haplotype phasing with population-based phasing with uncompromised accuracy.


Assuntos
Regulador de Condutância Transmembrana em Fibrose Cística/genética , Genótipo , Haplótipos/genética , Diagnóstico Pré-Implantação , Algoritmos , Alelos , Feminino , Efeito Fundador , Heterozigoto , Humanos , Judeus/genética , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA
12.
Mol Biol Evol ; 30(9): 2187-96, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23728796

RESUMO

When we sequence a diploid individual, the output actually comprises two genomes: one from the paternal parent and the other from the maternal parent. In this study, we introduce a novel heuristic algorithm for distinguishing single-nucleotide polymorphisms (SNPs) from the two parents and phasing them into haplotypes. The algorithm is unique because it simultaneously performs SNP calling and haplotype phasing. This approach can exploit the linkage information of nearby SNPs, which facilitates the efficient removal of haplotypes that originate from incorrectly mapped short reads. Using simulated data we demonstrated that our approach increased the accuracy of SNP calls. The haplotype reconstruction performance depended largely on the density of SNPs. Using current next-generation sequence technology with a relatively short read length, reasonable performance is expected when this approach is applied to species with an average of five heterozygous sites per 1 kb. The algorithm was implemented as the program "linkSNPs."


Assuntos
Algoritmos , Diploide , Drosophila melanogaster/genética , Haplótipos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Software , Alelos , Animais , Sequência de Bases , Ligação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Dados de Sequência Molecular , Taxa de Mutação
13.
Methods Mol Biol ; 2809: 193-214, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38907899

RESUMO

The outcome of Hematopoietic Stem Cell (HSCT) and organ transplant is strongly affected by the matching of the HLA alleles of the donor and the recipient. However, donors and sometimes recipients are often typed at low resolution, with some alleles either missing or ambiguous. Thus, imputation methods are required to detect the most probably high-resolution HLA haplotypes consistent with a typing. Such imputation algorithms require predefined haplotype frequencies. As such, the phasing of the typing is required for both imputation and frequency generation.We have developed a new approach to HLA haplotype and genotype imputation, where first all candidate phases of a typing are explicated, and then the ambiguity within each phase is solved. This ambiguity is solved through a graph structure of all partial haplotypes and the haplotypes consistent with them.This phasing approach was used to produce an imputation algorithm (GRIMM-Graph Imputation and Matching). GRIMM was then combined with the possibility of combining information from multiple races to produce MR-GRIMM (Multi-Race GRIMM). When family information is available, the phasing of each family member can be restricted by the others. We propose GRAMM (GRaph-bAsed faMily iMputation) to phase alleles in family pedigree HLA typing data and in mother-cord blood unit pairs. Finally, we combined MR-GRIMM with an expectation-maximization (EM) algorithm to estimate haplotype frequencies sharing information between races to produce MR-GRIMME (MR-GRIMM EM).We have shown that these algorithms naturally combine information between races and family members. The accuracy of each of these algorithms is significantly better than its current parallel methods. MR-GRIMM leads to high accuracy in matching predictions. GRAMM better imputes family members than either MR-GRIMM or any existing algorithm and has practically no phasing errors. MR-GRIMME obtains a higher likelihood than existing algorithms.MR-GRIMM, MR-GRIMME, and GRAMM are available as servers or through stand-alone versions in GITHUB and PyPi, as detailed in the appropriate sections.


Assuntos
Algoritmos , Antígenos HLA , Haplótipos , Teste de Histocompatibilidade , Doadores de Tecidos , Humanos , Antígenos HLA/genética , Teste de Histocompatibilidade/métodos , Alelos , Software , Frequência do Gene , Família , Genótipo , Transplante de Células-Tronco Hematopoéticas
14.
G3 (Bethesda) ; 14(8)2024 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-38861413

RESUMO

The implementation of a new genomic assembly pipeline named only the best (otb) has effectively addressed various challenges associated with data management during the development and storage of genome assemblies. otb, which incorporates a comprehensive pipeline involving a setup layer, quality checks, templating, and the integration of Nextflow and Singularity. The primary objective of otb is to streamline the process of creating a HiFi/HiC genome, aiming to minimize the manual intervention required in the genome assembly process. The 2-lined spittlebug, (Prosapia bicincta, Hemiptera: Cercopidae), a true bug insect herbivore, serves as a practical test case for evaluating otb. The 2-lined spittlebug is both a crucial agricultural pest and a genomically understudied insect belonging to the order Hemiptera. This insect is a significant threat to grasslands and pastures, leading to plant wilting and phytotoxemia when infested. Its presence in tropical and subtropical regions around the world poses a long-term threat to the composition of plant communities in grassland landscapes, impacting rangelands, and posing a substantial risk to cattle production.


Assuntos
Genoma de Inseto , Genômica , Animais , Genômica/métodos , Hemípteros/genética , Software
15.
bioRxiv ; 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38529488

RESUMO

The combination of ultra-long Oxford Nanopore (ONT) sequencing reads with long, accurate PacBio HiFi reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, "telomere-to-telomere" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT "Duplex" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely-studied genomes: human HG002, Solanum lycopersicum Heinz 1706 (tomato), and Zea mays B73 (maize). For the diploid, heterozygous HG002 genome, we also used "Pore-C" chromatin contact mapping to completely phase the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and accuracy enables the construction of a high-quality initial assembly, which can then be further resolved using the ultra-long reads, and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999% (Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a viable alternative to HiFi sequencing for de novo genome assembly, and has the potential to provide a single-instrument solution for the reconstruction of complete genomes.

16.
Methods Mol Biol ; 2545: 429-458, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36720827

RESUMO

Polyploidy has been observed throughout major eukaryotic clades and has played a vital role in the evolution of angiosperms. Recent polyploidizations often result in highly complex genome structures, posing challenges to genome assembly and phasing. Recent advances in sequencing technologies and genome assembly algorithms have enabled high-quality, near-complete chromosome-level assemblies of polyploid genomes. Advances in novel sequencing technologies include highly accurate single-molecule sequencing with HiFi reads, chromosome conformation capture with Hi-C technique, and linked reads sequencing. Additionally, new computational approaches have also significantly improved the precision and reliability of polyploid genome assembly and phasing, such as HiCanu, hifiasm, ALLHiC, and PolyGembler. Herein, we review recently published polyploid genomes and compare the various sequencing, assembly, and phasing approaches that are utilized in these genome studies. Finally, we anticipate that accurate and telomere-to-telomere chromosome-level assembly of polyploid genomes could ultimately become a routine procedure in the near future.


Assuntos
Algoritmos , Eucariotos , Humanos , Reprodutibilidade dos Testes , Células Eucarióticas , Poliploidia
17.
Methods Mol Biol ; 2590: 149-159, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36335498

RESUMO

Haplotype ("haploid genotype") phase is the combination of genotypes at sites of genetic variation along a chromosome [1]. We previously demonstrated that the complete chromosomal haplotype of diploid human genomes can be determined using molecular linkage from Hi-C sequencing and linked-reads sequencing [2]. In this chapter, we present a step-by-step guide to perform this analysis using mLinker, a software package for haplotype inference.


Assuntos
Algoritmos , Genoma Humano , Humanos , Haplótipos/genética , Genótipo , Cromossomos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
18.
HLA ; 102(4): 477-488, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37102220

RESUMO

Recently, haplo-identical transplantation with multiple HLA mismatches has become a viable option for stem cell transplants. Haplotype sharing detection requires the imputation of donor and recipient. We show that even in high-resolution typing when all alleles are known, there is a 15% error rate in haplotype phasing, and even more in low-resolution typings. Similarly, in related donors, the parents' haplotypes should be imputed to determine what haplotype each child inherited. We propose graph-based family imputation (GRAMM) to phase alleles in family pedigree HLA typing data, and in mother-cord blood unit pairs. We show that GRAMM has practically no phasing errors when pedigree data are available. We apply GRAMM to simulations with different typing resolutions as well as paired cord-mother typings, and show very high phasing accuracy, and improved allele imputation accuracy. We use GRAMM to detect recombination events and show that the rate of falsely detected recombination events (false-positive rate) in simulations is very low. We then apply recombination detection to typed families to estimate the recombination rate in Israeli and Australian population datasets. The estimated recombination rate has an upper bound of 10%-20% per family (1%-4% per individual).


Assuntos
Doadores de Tecidos , Criança , Humanos , Alelos , Austrália , Haplótipos
19.
Methods Mol Biol ; 2590: 49-57, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36335491

RESUMO

Haplotyping individual full-length transcripts can be important in diagnosis and treatment of certain genetic diseases. One set of diseases, repeat expansions of simple tandem repeat sequences are the cause of over 40 neurological disorders. In many of these conditions, expanding a polymorphic repeat beyond a given threshold has been strongly associated with disease onset and severity. Given that most repeat expansions are inherited in an autosomal dominant pattern, repeat expansion disorders are typically characterized by a heterozygous expansion locus associated with a single haplotype. Precision genetic medicines can be used to selectively target expansion-containing sequences in a haplotype-specific manner.However, repeat expansion lengths often exceed the capacity of next-generation sequencing (NGS) reads. Therefore, the accurate length and haplotype determination of repeat expansions requires special considerations and requires the development of custom methods. Here we highlight a method for targeted haplotype phasing of the HTT gene, which can be adopted for use with other full-length transcripts and in other repeat expansion disorders.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Sequências de Repetição em Tandem , Haplótipos , Heterozigoto , Análise de Sequência de DNA
20.
J Comput Biol ; 29(2): 195-211, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-35041529

RESUMO

Resolving haplotypes in polyploid genomes using phase information from sequencing reads is an important and challenging problem. We introduce two new mathematical formulations of polyploid haplotype phasing: (1) the min-sum max tree partition problem, which is a more flexible graphical metric compared with the standard minimum error correction (MEC) model in the polyploid setting, and (2) the uniform probabilistic error minimization model, which is a probabilistic analogue of the MEC model. We incorporate both formulations into a long-read based polyploid haplotype phasing method called flopp. We show that flopp compares favorably with state-of-the-art algorithms-up to 30 times faster with 2 times fewer switch errors on 6 × ploidy simulated data. Further, we show using real nanopore data that flopp can quickly reveal reasonable haplotype structures from the autotetraploid Solanum tuberosum (potato).


Assuntos
Algoritmos , Haplótipos , Poliploidia , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Genoma de Planta , Modelos Genéticos , Modelos Estatísticos , Família Multigênica , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/estatística & dados numéricos , Software , Solanum tuberosum/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA