Pesquisa | Biblioteca Virtual em Saúde

1.

Distinct Classes of Complex Structural Variation Uncovered across Thousands of Cancer Genome Graphs.

Hadi, Kevin; Yao, Xiaotong; Behr, Julie M; Deshpande, Aditya; Xanthopoulakis, Charalampos; Tian, Huasong; Kudman, Sarah; Rosiene, Joel; Darmofal, Madison; DeRose, Joseph; Mortensen, Rick; Adney, Emily M; Shaiber, Alon; Gajic, Zoran; Sigouros, Michael; Eng, Kenneth; Wala, Jeremiah A; Wrzeszczynski, Kazimierz O; Arora, Kanika; Shah, Minita; Emde, Anne-Katrin; Felice, Vanessa; Frank, Mayu O; Darnell, Robert B; Ghandi, Mahmoud; Huang, Franklin; Dewhurst, Sally; Maciejowski, John; de Lange, Titia; Setton, Jeremy; Riaz, Nadeem; Reis-Filho, Jorge S; Powell, Simon; Knowles, David A; Reznik, Ed; Mishra, Bud; Beroukhim, Rameen; Zody, Michael C; Robine, Nicolas; Oman, Kenji M; Sanchez, Carissa A; Kuhner, Mary K; Smith, Lucian P; Galipeau, Patricia C; Paulson, Thomas G; Reid, Brian J; Li, Xiaohong; Wilkes, David; Sboner, Andrea; Mosquera, Juan Miguel.

Cell ; 183(1): 197-210.e32, 2020 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-33007263

RESUMO

Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.

Assuntos

Variação Estrutural do Genoma/genética , Genômica/métodos , Neoplasias/genética , Inversão Cromossômica/genética , Cromotripsia , Variações do Número de Cópias de DNA/genética , Rearranjo Gênico/genética , Genoma Humano/genética , Humanos , Mutação/genética , Sequenciamento Completo do Genoma/métodos

2.

Megabase Length Hypermutation Accompanies Human Structural Variation at 17p11.2.

Beck, Christine R; Carvalho, Claudia M B; Akdemir, Zeynep C; Sedlazeck, Fritz J; Song, Xiaofei; Meng, Qingchang; Hu, Jianhong; Doddapaneni, Harsha; Chong, Zechen; Chen, Edward S; Thornton, Philip C; Liu, Pengfei; Yuan, Bo; Withers, Marjorie; Jhangiani, Shalini N; Kalra, Divya; Walker, Kimberly; English, Adam C; Han, Yi; Chen, Ken; Muzny, Donna M; Ira, Grzegorz; Shaw, Chad A; Gibbs, Richard A; Hastings, P J; Lupski, James R.

Cell ; 176(6): 1310-1324.e10, 2019 03 07.

Artigo em Inglês | MEDLINE | ID: mdl-30827684

RESUMO

DNA rearrangements resulting in human genome structural variants (SVs) are caused by diverse mutational mechanisms. We used long- and short-read sequencing technologies to investigate end products of de novo chromosome 17p11.2 rearrangements and query the molecular mechanisms underlying both recurrent and non-recurrent events. Evidence for an increased rate of clustered single-nucleotide variant (SNV) mutation in cis with non-recurrent rearrangements was found. Indel and SNV formation are associated with both copy-number gains and losses of 17p11.2, occur up to â¼1 Mb away from the breakpoint junctions, and favor C > G transversion substitutions; results suggest that single-stranded DNA is formed during the genesis of the SV and provide compelling support for a microhomology-mediated break-induced replication (MMBIR) mechanism for SV formation. Our data show an additional mutational burden of MMBIR consisting of hypermutation confined to the locus and manifesting as SNVs and indels predominantly within genes.

Assuntos

Cromossomos Humanos Par 17 , Mutação , Anormalidades Múltiplas/genética , Pontos de Quebra do Cromossomo , Transtornos Cromossômicos/genética , Duplicação Cromossômica/genética , Variações do Número de Cópias de DNA , Reparo do DNA/genética , Replicação do DNA , Rearranjo Gênico , Genoma Humano , Variação Estrutural do Genoma , Humanos , Mutação INDEL , Modelos Genéticos , Polimorfismo de Nucleotídeo Único , Recombinação Genética , Análise de Sequência de DNA/métodos , Síndrome de Smith-Magenis/genética

3.

The wild side of grape genomics.

Cantu, Dario; Massonnet, Mélanie; Cochetel, Noé.

Trends Genet ; 40(7): 601-612, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38777691

RESUMO

With broad genetic diversity and as a source of key agronomic traits, wild grape species (Vitis spp.) are crucial to enhance viticulture's climatic resilience and sustainability. This review discusses how recent breakthroughs in the genome assembly and analysis of wild grape species have led to discoveries on grape evolution, from wild species' adaptation to environmental stress to grape domestication. We detail how diploid chromosome-scale genomes from wild Vitis spp. have enabled the identification of candidate disease-resistance and flower sex determination genes and the creation of the first Vitis graph-based pangenome. Finally, we explore how wild grape genomics can impact grape research and viticulture, including aspects such as data sharing, the development of functional genomics tools, and the acceleration of genetic improvement.

Assuntos

Genoma de Planta , Genômica , Vitis , Vitis/genética , Genômica/métodos , Genoma de Planta/genética , Variação Genética , Resistência à Doença/genética , Domesticação , Evolução Molecular

4.

Statistical phasing of 150,119 sequenced genomes in the UK Biobank.

Browning, Brian L; Browning, Sharon R.

Am J Hum Genet ; 110(1): 161-165, 2023 01 05.

Artigo em Inglês | MEDLINE | ID: mdl-36450278

RESUMO

The first release of UK Biobank whole-genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole-genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and Tabix for VCF indexing. We used the pipeline to phase 406 million single-nucleotide variants on chromosomes 1-22 and X at a cost of £2,309. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.

Assuntos

Bancos de Espécimes Biológicos , Genoma , Humanos , Cães , Animais , Genótipo , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Reino Unido , Algoritmos , Análise de Sequência de DNA/métodos

5.

miniSNV: accurate and fast single nucleotide variant calling from nanopore sequencing data.

Cui, Miao; Liu, Yadong; Yu, Xian; Guo, Hongzhe; Jiang, Tao; Wang, Yadong; Liu, Bo.

Brief Bioinform ; 25(6)2024 Sep 23.

Artigo em Inglês | MEDLINE | ID: mdl-39331016

RESUMO

Nanopore sequence technology has demonstrated a longer read length and enabled to potentially address the limitations of short-read sequencing including long-range haplotype phasing and accurate variant calling. However, there is still room for improvement in terms of the performance of single nucleotide variant (SNV) identification and computing resource usage for the state-of-the-art approaches. In this work, we introduce miniSNV, a lightweight SNV calling algorithm that simultaneously achieves high performance and yield. miniSNV utilizes known common variants in populations as variation backgrounds and leverages read pileup, read-based phasing, and consensus generation to identify and genotype SNVs for Oxford Nanopore Technologies (ONT) long reads. Benchmarks on real and simulated ONT data under various error profiles demonstrate that miniSNV has superior sensitivity and comparable accuracy on SNV detection and runs faster with outstanding scalability and lower memory than most state-of-the-art variant callers. miniSNV is available from https://github.com/CuiMiao-HIT/miniSNV.

Assuntos

Algoritmos , Sequenciamento por Nanoporos , Polimorfismo de Nucleotídeo Único , Sequenciamento por Nanoporos/métodos , Software , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos

6.

Haxe as a Swiss knife for bioinformatic applications: the SeqPHASE case story.

Spöri, Yann; Flot, Jean-François.

Brief Bioinform ; 25(5)2024 Jul 25.

Artigo em Inglês | MEDLINE | ID: mdl-39193916

RESUMO

Haxe is a general purpose, object-oriented programming language supporting syntactic macros. The Haxe compiler is well known for its ability to translate the source code of Haxe programs into the source code of a variety of other programming languages including Java, C++, JavaScript, and Python. Although Haxe is more and more used for a variety of purposes, including games, it has not yet attracted much attention from bioinformaticians. This is surprising, as Haxe allows generating different versions of the same program (e.g. a graphical user interface version in JavaScript running in a web browser for beginners and a command-line version in C++ or Python for increased performance) while maintaining a single code, a feature that should be of interest for many bioinformatic applications. To demonstrate the usefulness of Haxe in bioinformatics, we present here the case story of the program SeqPHASE, written originally in Perl (with a CGI version running on a server) and published in 2010. As Perl+CGI is not desirable anymore for security purposes, we decided to rewrite the SeqPHASE program in Haxe and to host it at Github Pages (https://eeg-ebe.github.io/SeqPHASE), thereby alleviating the need to configure and maintain a dedicated server. Using SeqPHASE as an example, we discuss the advantages and disadvantages of Haxe's source code conversion functionality when it comes to implementing bioinformatic software.

Assuntos

Biologia Computacional , Linguagens de Programação , Software , Biologia Computacional/métodos

7.

Subgenome phasing for complex allopolyploidy: case-based benchmarking and recommendations.

Zhang, Ren-Gang; Shang, Hong-Yun; Jia, Kai-Hua; Ma, Yong-Peng.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38189536

RESUMO

Accurate subgenome phasing is crucial for understanding the origin, evolution and adaptive potential of polyploid genomes. SubPhaser and WGDI software are two common methodologies for subgenome phasing in allopolyploids, particularly in scenarios lacking known diploid progenitors. Triggered by a recent debate over the subgenomic origins of the cultivated octoploid strawberry, we examined four well-documented complex allopolyploidy cases as benchmarks, to evaluate and compare the accuracy of the two software. Our analysis demonstrates that the subgenomic structure phased by both software is in line with prior research, effectively tracing complex allopolyploid evolutionary trajectories despite the limitations of each software. Furthermore, using these validated methodologies, we revisited the controversial issue regarding the progenitors of the octoploid strawberry. The results of both methodologies reaffirm Fragaria vesca and Fragaria iinumae as progenitors of the octoploid strawberry. Finally, we propose recommendations for enhancing the accuracy of subgenome phasing in future studies, recognizing the potential of integrated tools for advanced complex allopolyploidy research and offering a new roadmap for robust subgenome-based phylogenetic analysis.

Assuntos

Benchmarking , Fragaria , Filogenia , Fragaria/genética , Poliploidia , Software

8.

An Unstable Singularity Underlies Stochastic Phasing of the Circadian Clock in Individual Cyanobacterial Cells.

Gan, Siting; O'Shea, Erin K.

Mol Cell ; 67(4): 659-672.e12, 2017 Aug 17.

Artigo em Inglês | MEDLINE | ID: mdl-28803778

RESUMO

The endogenous circadian clock synchronizes with environmental time by appropriately resetting its phase in response to external cues. Of note, some resetting stimuli induce attenuated oscillations of clock output, which has been observed at the population-level in several organisms and in studies of individual humans. To investigate what is happening in individual cellular clocks, we studied the unicellular cyanobacterium S. elongatus. By measuring its phase-resetting responses to temperature changes, we found that population-level arrhythmicity occurs when certain perturbations cause stochastic phases of oscillations in individual cells. Combining modeling with experiments, we related stochastic phasing to the dynamical structure of the cyanobacterial clock as an oscillator and explored the physiological relevance of the oscillator structure for accurately timed rhythmicity in changing environmental conditions. Our findings and approach can be applied to other biological oscillators.

Assuntos

Proteínas de Bactérias/metabolismo , Relógios Circadianos , Peptídeos e Proteínas de Sinalização do Ritmo Circadiano/metabolismo , Ritmo Circadiano , Modelos Biológicos , Synechococcus/metabolismo , Temperatura , Adaptação Fisiológica , Proteínas de Bactérias/genética , Peptídeos e Proteínas de Sinalização do Ritmo Circadiano/genética , Simulação por Computador , Microscopia de Fluorescência , Transdução de Sinais , Análise de Célula Única , Processos Estocásticos , Synechococcus/genética , Fatores de Tempo , Imagem com Lapso de Tempo

9.

GCphase: an SNP phasing method using a graph partition and error correction algorithm.

Luo, Junwei; Wang, Jiayi; Zhai, Haixia; Wang, Junfeng.

BMC Bioinformatics ; 25(1): 267, 2024 Aug 19.

Artigo em Inglês | MEDLINE | ID: mdl-39160480

RESUMO

BACKGROUND: The utilization of long reads for single nucleotide polymorphism (SNP) phasing has become popular, providing substantial support for research on human diseases and genetic studies in animals and plants. However, due to the complexity of the linkage relationships between SNP loci and sequencing errors in the reads, the recent methods still cannot yield satisfactory results. RESULTS: In this study, we present a graph-based algorithm, GCphase, which utilizes the minimum cut algorithm to perform phasing. First, based on alignment between long reads and the reference genome, GCphase filters out ambiguous SNP sites and useless read information. Second, GCphase constructs a graph in which a vertex represents alleles of an SNP locus and each edge represents the presence of read support; moreover, GCphase adopts a graph minimum-cut algorithm to phase the SNPs. Next, GCpahse uses two error correction steps to refine the phasing results obtained from the previous step, effectively reducing the error rate. Finally, GCphase obtains the phase block. GCphase was compared to three other methods, WhatsHap, HapCUT2, and LongPhase, on the Nanopore and PacBio long-read datasets. The code is available from https://github.com/baimawjy/GCphase . CONCLUSIONS: Experimental results show that GCphase under different sequencing depths of different data has the least number of switch errors and the highest accuracy compared with other methods.

Assuntos

Algoritmos , Polimorfismo de Nucleotídeo Único , Polimorfismo de Nucleotídeo Único/genética , Humanos , Análise de Sequência de DNA/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos

10.

NanoImprint: A DNA methylation tool for clinical interpretation and diagnosis of common imprinting disorders using nanopore long-read sequencing.

Bækgaard, Caroline Hey; Lester, Emilie Boye; Møller-Larsen, Steffen; Lauridsen, Mathilde Faurholdt; Larsen, Martin Jakob.

Ann Hum Genet ; 88(5): 392-398, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38690755

RESUMO

INTRODUCTION: Long-read whole genome sequencing like Oxford Nanopore Technology, is increasingly being introduced in clinical settings. With its ability to simultaneously call sequence variation and DNA modifications including 5-methylcytosine, nanopore is a promising technology to improve diagnostics of imprinting disorders. METHODS: Currently, no tools to analyze DNA methylation patterns at known clinically relevant imprinted regions are available. Here we present NanoImprint, which generates an easily interpretable report, based on long-read nanopore sequencing, to use for identifying clinical relevant abnormalities in methylation levels at 14 imprinted regions and diagnosis of common imprinting disorders. RESULTS AND CONCLUSION: NanoImprint outputs a summarizing table and visualization plots displays methylation frequency (%) and chromosomal positions for all regions, with phased data color-coded for the two alleles. We demonstrate the utility of NanoImprint using three imprinting disorder samples from patients with Beckwith-Wiedemann syndrome (BWS), Angelman syndrome (AS) and Prader-Willi syndrome (PWS). NanoImprint script is available from https://github.com/carolinehey/NanoImprint.

Assuntos

Síndrome de Angelman , Síndrome de Beckwith-Wiedemann , Metilação de DNA , Sequenciamento por Nanoporos , Síndrome de Prader-Willi , Humanos , Síndrome de Angelman/genética , Síndrome de Angelman/diagnóstico , Síndrome de Beckwith-Wiedemann/genética , Síndrome de Beckwith-Wiedemann/diagnóstico , Sequenciamento por Nanoporos/métodos , Nanoporos , Síndrome de Prader-Willi/genética , Síndrome de Prader-Willi/diagnóstico , Análise de Sequência de DNA/métodos

11.

Fast two-stage phasing of large-scale sequence data.

Browning, Brian L; Tian, Xiaowen; Zhou, Ying; Browning, Sharon R.

Am J Hum Genet ; 108(10): 1880-1890, 2021 10 07.

Artigo em Inglês | MEDLINE | ID: mdl-34478634

RESUMO

Haplotype phasing is the estimation of haplotypes from genotype data. We present a fast, accurate, and memory-efficient haplotype phasing method that scales to large-scale SNP array and sequence data. The method uses marker windowing and composite reference haplotypes to reduce memory usage and computation time. It incorporates a progressive phasing algorithm that identifies confidently phased heterozygotes in each iteration and fixes the phase of these heterozygotes in subsequent iterations. For data with many low-frequency variants, such as whole-genome sequence data, the method employs a two-stage phasing algorithm that phases high-frequency markers via progressive phasing in the first stage and phases low-frequency markers via genotype imputation in the second stage. This haplotype phasing method is implemented in the open-source Beagle 5.2 software package. We compare Beagle 5.2 and SHAPEIT 4.2.1 by using expanding subsets of 485,301 UK Biobank samples and 38,387 TOPMed samples. Both methods have very similar accuracy and computation time for UK Biobank SNP array data. However, for TOPMed sequence data, Beagle is more than 20 times faster than SHAPEIT, achieves similar accuracy, and scales to larger sample sizes.

Assuntos

Asma/genética , Fibrilação Atrial/genética , Interpretação Estatística de Dados , Genoma Humano , Haplótipos , Polimorfismo de Nucleotídeo Único , Software , Algoritmos , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino

12.

A fine-scale genetic map of the Japanese population.

Takayama, Jun; Makino, Satoshi; Funayama, Takamitsu; Ueki, Masao; Narita, Akira; Murakami, Keiko; Orui, Masatsugu; Ishikuro, Mami; Obara, Taku; Kuriyama, Shinichi; Yamamoto, Masayuki; Tamiya, Gen.

Clin Genet ; 106(3): 284-292, 2024 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-38719617

RESUMO

Genetic maps are fundamental resources for linkage and association studies. A fine-scale genetic map can be constructed by inferring historical recombination events from the genome-wide structure of linkage disequilibrium-a non-random association of alleles among loci-by using population-scale sequencing data. We constructed a fine-scale genetic map and identified recombination hotspots from 10 092 551 bi-allelic high-quality autosomal markers segregating among 150 unrelated Japanese individuals whose genotypes were determined by high-coverage (30×) whole-genome sequencing, and the genotype quality was carefully controlled by using their parents' and offspring's genotypes. The pedigree information was also utilized for haplotype phasing. The resulting genome-wide recombination rate profiles were concordant with those of the worldwide population on a broad scale, and the resolution was much improved. We identified 9487 recombination hotspots and confirmed the enrichment of previously known motifs in the hotspots. Moreover, we demonstrated that the Japanese genetic map improved the haplotype phasing and genotype imputation accuracy for the Japanese population. The construction of a population-specific genetic map will help make genetics research more accurate.

Assuntos

Mapeamento Cromossômico , População do Leste Asiático , Desequilíbrio de Ligação , Recombinação Genética , Humanos , Alelos , População do Leste Asiático/genética , Ligação Genética , Genética Populacional , Genoma Humano , Estudo de Associação Genômica Ampla , Genótipo , Haplótipos , Japão , Linhagem , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma

13.

Haplotyping Using Long-Range PCR and Nanopore Sequencing to Phase Variants: Lessons Learned From the ABCA4 Locus.

McClinton, Benjamin; Watson, Christopher M; Crinnion, Laura A; McKibbin, Martin; Ali, Manir; Inglehearn, Chris F; Toomes, Carmel.

Lab Invest ; 103(8): 100160, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37088464

RESUMO

Short-read next-generation sequencing has revolutionized our ability to identify variants underlying inherited diseases; however, it does not allow the phasing of variants to clarify their diagnostic interpretation. The advent of widespread, increasingly accurate long-read sequencing has opened up new applications not currently available through short-read next-generation sequencing. One such use is the ability to phase variants to clarify their diagnostic interpretation and to investigate the increasingly prevalent role of cis-acting variants in the pathogenesis of the inherited disease, so-called complex alleles. Complex alleles are becoming an increasingly prevalent part of the study of genes associated with inherited diseases, for example, in ABCA4-related diseases. We sought to establish a cost-effective method to phase contiguous segments of the 130-kb ABCA4 locus by long-read sequencing of overlapping amplification products. Using the comprehensively characterized CEPH sample, NA12878, we verified the accuracy and robustness of our assay. However, in-field assessment of its utility using clinical test cases was hampered by the paucity and distribution of identified variants and by PCR chimerism, particularly where the number of PCR cycles was high. Despite this, we were able to construct robust phase blocks of up to 94.9 kb, representing 73% of the ABCA4 locus. We conclude that, although haplotype analysis of variants located within discrete amplification products was robust and informative, the stitching together of larger phase blocks using overlapping single-molecule reads remained practically challenging.

Assuntos

Sequenciamento por Nanoporos , Haplótipos/genética , Alelos , Reação em Cadeia da Polimerase , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos

14.

Fast, automated, continuous energy scans for experimental phasing at the BioMAX beamline.

Gorgisyan, Ishkhan; Bell, Paul; Cascella, Michele; Eguiraun, Mikel; Freitas, Áureo; Lidon-Simon, Julio; Nan, Jie; Takahashi, Carla; Tarawneh, Hamed; Ursby, Thomas; Gonzalez, Ana.

J Synchrotron Radiat ; 30(Pt 5): 885-894, 2023 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-37526994

RESUMO

In X-ray macromolecular crystallography (MX), single-wavelength anomalous dispersion (SAD) and multi-wavelength anomalous dispersion (MAD) techniques are commonly used for obtaining experimental phases. For an MX synchrotron beamline to support SAD and MAD techniques it is a prerequisite to have a reliable, fast and well automated energy scan routine. This work reports on a continuous energy scan procedure newly implemented at the BioMAX MX beamline at MAX IV Laboratory. The continuous energy scan is fully automated, capable of measuring accurate fluorescence counts over the absorption edge of interest while minimizing the sample exposure to X-rays, and is about a factor of five faster compared with a conventional step scan previously operational at BioMAX. The implementation of the continuous energy scan facilitates the prompt access to the anomalous scattering data, required for the SAD and MAD experiments.

15.

Evaluation of consensus strategies for haplotype phasing.

Al Bkhetan, Ziad; Chana, Gursharan; Ramamohanarao, Kotagiri; Verspoor, Karin; Goudey, Benjamin.

Brief Bioinform ; 22(4)2021 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-33236761

RESUMO

Haplotype phasing is a critical step for many genetic applications but incorrect estimates of phase can negatively impact downstream analyses. One proposed strategy to improve phasing accuracy is to combine multiple independent phasing estimates to overcome the limitations of any individual estimate. However, such a strategy is yet to be thoroughly explored. This study provides a comprehensive evaluation of consensus strategies for haplotype phasing. We explore the performance of different consensus paradigms, and the effect of specific constituent tools, across several datasets with different characteristics and their impact on the downstream task of genotype imputation. Based on the outputs of existing phasing tools, we explore two different strategies to construct haplotype consensus estimators: voting across outputs from multiple phasing tools and multiple outputs of a single non-deterministic tool. We find that the consensus approach from multiple tools reduces SE by an average of 10% compared to any constituent tool when applied to European populations and has the highest accuracy regardless of population ethnicity, sample size, variant density or variant frequency. Furthermore, the consensus estimator improves the accuracy of the downstream task of genotype imputation carried out by the widely used Minimac3, pbwt and BEAGLE5 tools. Our results provide guidance on how to produce the most accurate phasing estimates and the trade-offs that a consensus approach may have. Our implementation of consensus haplotype phasing, consHap, is available freely at https://github.com/ziadbkh/consHap. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.

Assuntos

Algoritmos , Bases de Dados de Ácidos Nucleicos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA , Haplótipos , Humanos

16.

PERHAPS: Paired-End short Reads-based HAPlotyping from next-generation Sequencing data.

Huang, Jie; Pallotti, Stefano; Zhou, Qianling; Kleber, Marcus; Xin, Xiaomeng; King, Daniel A; Napolioni, Valerio.

Brief Bioinform ; 22(4)2021 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-33285565

RESUMO

The identification of rare haplotypes may greatly expand our knowledge in the genetic architecture of both complex and monogenic traits. To this aim, we developed PERHAPS (Paired-End short Reads-based HAPlotyping from next-generation Sequencing data), a new and simple approach to directly call haplotypes from short-read, paired-end Next Generation Sequencing (NGS) data. To benchmark this method, we considered the APOE classic polymorphism (*1/*2/*3/*4), since it represents one of the best examples of functional polymorphism arising from the haplotype combination of two Single Nucleotide Polymorphisms (SNPs). We leveraged the big Whole Exome Sequencing (WES) and SNP-array data obtained from the multi-ethnic UK BioBank (UKBB, N=48,855). By applying PERHAPS, based on piecing together the paired-end reads according to their FASTQ-labels, we extracted the haplotype data, along with their frequencies and the individual diplotype. Concordance rates between WES directly called diplotypes and the ones generated through statistical pre-phasing and imputation of SNP-array data are extremely high (>99%), either when stratifying the sample by SNP-array genotyping batch or self-reported ethnic group. Hardy-Weinberg Equilibrium tests and the comparison of obtained haplotype frequencies with the ones available from the 1000 Genome Project further supported the reliability of PERHAPS. Notably, we were able to determine the existence of the rare APOE*1 haplotype in two unrelated African subjects from UKBB, supporting its presence at appreciable frequency (approximatively 0.5%) in the African Yoruba population. Despite acknowledging some technical shortcomings, PERHAPS represents a novel and simple approach that will partly overcome the limitations in direct haplotype calling from short read-based sequencing.

Assuntos

Algoritmos , Genoma Humano , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único , Apolipoproteínas E/genética , Projeto Genoma Humano , Humanos

17.

Evaluating species boundaries using coalescent delimitation in pine-killing Monochamus (Coleoptera: Cerambycidae) sawyer beetles.

Gorring, Patrick S; Farrell, Brian D.

Mol Phylogenet Evol ; 184: 107777, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-36990304

RESUMO

Plant-feeding beetle species are diverse and often individually highly variable. Accurate classifications can be difficult to establish yet are essential for study of evolutionary patterns and processes. Molecular data are key to further characterizing morphologically difficult groups and defining genus and species boundaries. Monochamus Dejean species are ecologically and economically significant, and in coniferous forests they vector the nematode that causes Pine Wilt Disease. This study uses nuclear and mitochondrial genes to test the monophyly and relationships of Monochamus and applies coalescent methods to further delimit the conifer-feeding species. Monochamus has also included approximately 120 Old World species associated with diverse angiosperm tree species. We sample from these additional morphologically diverse species to determine their placement in the Lamiini. Through supermatrix and coalescent methods, the higher-level relationships of Monochamus show that conifer-feeders are a monophyletic group that includes the type species and has split into Nearctic and Palearctic clades. Molecular dating indicates a single dispersal of conifer-feeders to North America over the second Bering Land Bridge circa 5.3â¯Ma. All other Monochamus sampled fall in different parts of the Lamiini tree. Small-bodied angiosperm-feeding Monochamus group with the monotypic genus Microgoes Casey. The African Monochamus subgenera sampled are distantly related to the conifer-feeding clade. The multispecies coalescent delimitation methods BPP and STACEY delimit 17 conifer-feeding Monochamus species for a total of 18 species, and supports the retention of all current species. An interrogation with nuclear gene allele phasing reveals that unphased data can be unreliable for accurate delimitations and divergence times. The delimited species are discussed with integrative evidence, highlighting real-world challenges in recognizing the completion of speciation.

Assuntos

Besouros , Nematoides , Pinus , Animais , Filogenia , América do Norte , Árvores

18.

The revised reference genome of the leopard gecko (Eublepharis macularius) provides insight into the considerations of genome phasing and assembly.

Pinto, Brendan J; Gamble, Tony; Smith, Chase H; Keating, Shannon E; Havird, Justin C; Chiari, Ylenia.

J Hered ; 114(5): 513-520, 2023 08 23.

Artigo em Inglês | MEDLINE | ID: mdl-36869788

RESUMO

Genomic resources across squamate reptiles (lizards and snakes) have lagged behind other vertebrate systems and high-quality reference genomes remain scarce. Of the 23 chromosome-scale reference genomes across the order, only 12 of the ~60 squamate families are represented. Within geckos (infraorder Gekkota), a species-rich clade of lizards, chromosome-level genomes are exceptionally sparse representing only two of the seven extant families. Using the latest advances in genome sequencing and assembly methods, we generated one of the highest-quality squamate genomes to date for the leopard gecko, Eublepharis macularius (Eublepharidae). We compared this assembly to the previous, short-read only, E. macularius reference genome published in 2016 and examined potential factors within the assembly influencing contiguity of genome assemblies using PacBio HiFi data. Briefly, the read N50 of the PacBio HiFi reads generated for this study was equal to the contig N50 of the previous E. macularius reference genome at 20.4 kilobases. The HiFi reads were assembled into a total of 132 contigs, which was further scaffolded using HiC data into 75 total sequences representing all 19 chromosomes. We identified 9 of the 19 chromosomal scaffolds were assembled as a near-single contig, whereas the other 10 chromosomes were each scaffolded together from multiple contigs. We qualitatively identified that the percent repeat content within a chromosome broadly affects its assembly contiguity prior to scaffolding. This genome assembly signifies a new age for squamate genomics where high-quality reference genomes rivaling some of the best vertebrate genome assemblies can be generated for a fraction of previous cost estimates. This new E. macularius reference assembly is available on NCBI at JAOPLA010000000.

Assuntos

Genoma , Lagartos , Humanos , Animais , Genômica/métodos , Mapeamento Cromossômico/métodos , Cromossomos , Lagartos/genética

19.

Timing and structure of the Younger Dryas event and its underlying climate dynamics.

Cheng, Hai; Zhang, Haiwei; Spötl, Christoph; Baker, Jonathan; Sinha, Ashish; Li, Hanying; Bartolomé, Miguel; Moreno, Ana; Kathayat, Gayatri; Zhao, Jingyao; Dong, Xiyu; Li, Youwei; Ning, Youfeng; Jia, Xue; Zong, Baoyun; Ait Brahim, Yassine; Pérez-Mejías, Carlos; Cai, Yanjun; Novello, Valdir F; Cruz, Francisco W; Severinghaus, Jeffrey P; An, Zhisheng; Edwards, R Lawrence.

Proc Natl Acad Sci U S A ; 117(38): 23408-23417, 2020 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-32900942

RESUMO

The Younger Dryas (YD), arguably the most widely studied millennial-scale extreme climate event, was characterized by diverse hydroclimate shifts globally and severe cooling at high northern latitudes that abruptly punctuated the warming trend from the last glacial to the present interglacial. To date, a precise understanding of its trigger, propagation, and termination remains elusive. Here, we present speleothem oxygen-isotope data that, in concert with other proxy records, allow us to quantify the timing of the YD onset and termination at an unprecedented subcentennial temporal precision across the North Atlantic, Asian Monsoon-Westerlies, and South American Monsoon regions. Our analysis suggests that the onsets of YD in the North Atlantic (12,870 ± 30 B.P.) and the Asian Monsoon-Westerlies region are essentially synchronous within a few decades and lead the onset in Antarctica, implying a north-to-south climate signal propagation via both atmospheric (decadal-time scale) and oceanic (centennial-time scale) processes, similar to the Dansgaard-Oeschger events during the last glacial period. In contrast, the YD termination may have started first in Antarctica at â¼11,900 B.P., or perhaps even earlier in the western tropical Pacific, followed by the North Atlantic between â¼11,700 ± 40 and 11,610 ± 40 B.P. These observations suggest that the initial YD termination might have originated in the Southern Hemisphere and/or the tropical Pacific, indicating a Southern Hemisphere/tropics to North Atlantic-Asian Monsoon-Westerlies directionality of climatic recovery.

20.

Decoding sex: Elucidating sex determination and how high-quality genome assemblies are untangling the evolutionary dynamics of sex chromosomes.

Ramos, Luana; Antunes, Agostinho.

Genomics ; 114(2): 110277, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-35104609

RESUMO

Sexual reproduction is a diverse and widespread process. In gonochoristic species, the differentiation of sexes occurs through diverse mechanisms, influenced by environmental and genetic factors. In most vertebrates, a master-switch gene is responsible for triggering a sex determination network. However, only a few genes have acquired master-switch functions, and this process is associated with the evolution of sex-chromosomes, which have a significant influence in evolution. Additionally, their highly repetitive regions impose challenges for high-quality sequencing, even using high-throughput, state-of-the-art techniques. Here, we review the mechanisms involved in sex determination and their role in the evolution of species, particularly vertebrates, focusing on sex chromosomes and the challenges involved in sequencing these genomic elements. We also address the improvements provided by the growth of sequencing projects, by generating a massive number of near-gapless, telomere-to-telomere, chromosome-level, phased assemblies, increasing the number and quality of sex-chromosome sequences available for further studies.

Assuntos

Cromossomos Sexuais , Telômero , Animais , Sequências Repetitivas de Ácido Nucleico , Cromossomos Sexuais/genética , Telômero/genética , Vertebrados/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA