RESUMO
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
Assuntos
Genoma Humano , Genômica , Humanos , Diploide , Genoma Humano/genética , Haplótipos/genética , Análise de Sequência de DNA , Genômica/normas , Padrões de Referência , Estudos de Coortes , Alelos , Variação GenéticaRESUMO
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
Assuntos
Mapeamento Cromossômico , Diploide , Genoma Humano , Genômica , Humanos , Mapeamento Cromossômico/normas , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Padrões de Referência , Genômica/métodos , Genômica/normas , Cromossomos Humanos/genética , Variação Genética/genéticaRESUMO
Despite advances in long-read sequencing technologies, constructing a near telomere-to-telomere assembly is still computationally demanding. Here we present hifiasm (UL), an efficient de novo assembly algorithm combining multiple sequencing technologies to scale up population-wide near telomere-to-telomere assemblies. Applied to 22 human and two plant genomes, our algorithm produces better diploid assemblies at a cost of an order of magnitude lower than existing methods, and it also works with polyploid genomes.
Assuntos
Algoritmos , Diploide , Poliploidia , Telômero , Humanos , Telômero/genética , Genoma de Planta , Genoma Humano , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we show that SRF could reconstruct known satellites in human and well-studied model organisms. We also find satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress in genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.
RESUMO
De novo assembly of metagenome samples is a common approach to the study of microbial communities. Current metagenome assemblers developed for short sequence reads or noisy long reads were not optimized for accurate long reads. We thus developed hifiasm-meta, a metagenome assembler that exploits the high accuracy of recent data. Evaluated on seven empirical datasets, hifiasm-meta reconstructed tens to hundreds of complete circular bacterial genomes per dataset, consistently outperforming other metagenome assemblers.
Assuntos
Metagenoma , Microbiota , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala , Microbiota/genética , Análise de Sequência de DNA , SoftwareRESUMO
Haplotype-resolved de novo assembly is the ultimate solution to the study of sequence variations in a genome. However, existing algorithms either collapse heterozygous alleles into one consensus copy or fail to cleanly separate the haplotypes to produce high-quality phased assemblies. Here we describe hifiasm, a de novo assembler that takes advantage of long high-fidelity sequence reads to faithfully represent the haplotype information in a phased assembly graph. Unlike other graph-based assemblers that only aim to maintain the contiguity of one haplotype, hifiasm strives to preserve the contiguity of all haplotypes. This feature enables the development of a graph trio binning algorithm that greatly advances over standard trio binning. On three human and five nonhuman datasets, including California redwood with a ~30-Gb hexaploid genome, we show that hifiasm frequently delivers better assemblies than existing tools and consistently outperforms others on haplotype-resolved assembly.
Assuntos
Genoma , Haplótipos , Análise de Sequência de DNA/métodos , AlgoritmosRESUMO
BACKGROUND AND OBJECTIVE: Leptin-deficient obesity is associated with various systemic diseases including diabetes and low bone mass phenotype. However, the periodontal status of leptin-deficient obese individuals is still unclear. In this study, we aimed to analyze the periodontal status, alveolar bone phenotype, and oral microbiome status in leptin-deficient obese mice (ob/ob mice). METHODS: This study used 12-week-old wild-type and ob/ob male mice. The alveolar bone phenotype and periodontal status in the maxilla were analyzed by micro-CT and histological analysis. Osteoclasts in alveolar bone were visualized by TRAP staining. Expressions of inflammatory markers (MMP-9, IL-1ß, and TGF-ß1) and osteoclastogenic markers (RANKL and OPG) in periodontium were analyzed by immunohistochemistry and RT-qPCR. The oral microbiome was analyzed by 16 S rDNA sequencing. RESULTS: CEJ-ABC distance in maxillary molars (M1-M3) of ob/ob mice was significantly higher compared with that of wild-type. The alveolar bone BV/TV ratio was reduced in ob/ob mice compared with wild-type. Higher numbers of osteoclasts were observed in ob/ob mice alveolar bone adjacent to the molar root. Epithelial hyperplasia in gingiva and disordered periodontal ligaments was observed in ob/ob mice. RANKL/OPG expression ratio was increased in ob/ob mice compared with wild-type. Expressions of inflammatory markers MMP-9, IL-1ß, and TGF-ß1 were increased in ob/ob mice compared with wild-type. Oral microbiome analysis showed that beneficial bacteria Akkermansia and Ruminococcaceae_UCG_014 were more abundant in the wild-type mice while the inflammation-related Flavobacterium was more abundant in ob/ob mice. CONCLUSION: In conclusion, ob/ob mice showed higher expressions of inflammatory factors, increased alveolar bone loss, lower abundance of the beneficial bacteria, and higher abundance of inflammatory bacteria in the oral cavity, suggesting leptin-deficient obesity as a risk factor for periodontitis development in ob/ob mice.
Assuntos
Perda do Osso Alveolar , Microbiota , Periodontite , Camundongos , Masculino , Animais , Fator de Crescimento Transformador beta1 , Metaloproteinase 9 da Matriz , Leptina , Periodontite/metabolismo , Perda do Osso Alveolar/patologia , Camundongos Endogâmicos , Fenótipo , Obesidade/complicações , Camundongos Endogâmicos C57BLRESUMO
MOTIVATION: Oxford Nanopore Technologies sequencing devices support adaptive sequencing, in which undesired reads can be ejected from a pore in real time. This feature allows targeted sequencing aided by computational methods for mapping partial reads, rather than complex library preparation protocols. However, existing mapping methods either require a computationally expensive base-calling procedure before using aligners to map partial reads or work well only on small genomes. RESULTS: In this work, we present a new streaming method that can map nanopore raw signals for real-time selective sequencing. Rather than converting read signals to bases, we propose to convert reference genomes to signals and fully operate in the signal space. Our method features a new way to index reference genomes using k-d trees, a novel seed selection strategy and a seed chaining algorithm tailored toward the current signal characteristics. We implemented the method as a tool Sigmap. Then we evaluated it on both simulated and real data and compared it to the state-of-the-art nanopore raw signal mapper Uncalled. Our results show that Sigmap yields comparable performance on mapping yeast simulated raw signals, and better mapping accuracy on mapping yeast real raw signals with a 4.4× speedup. Moreover, our method performed well on mapping raw signals to genomes of size >100 Mbp and correctly mapped 11.49% more real raw signals of green algae, which leads to a significantly higher F1-score (0.9354 versus 0.8660). AVAILABILITY AND IMPLEMENTATION: Sigmap code is accessible at https://github.com/haowenz/sigmap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Nanoporos , Algoritmos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , SoftwareRESUMO
Whether a graft-versus-graft (GVG) response in patients undergoing allogeneic hematopoietic stem cell transplantation (HSCT) is associated with an enhanced graft-versus-leukemia (GVL) effect remains highly controversial. Furthermore, it is unknown if the GVG response overwhelms the impact of refractory acute leukemia. We aimed to compare the characteristics and therapeutic outcomes between patients undergoing a modified haploidentical cord blood (cord-haplo) HSCT protocol (nâ¯=â¯97) and those undergoing haploidentical HSCT (nâ¯=â¯42) for refractory acute leukemia. A reliable and stable predominant haploidentical donor chimerism was established. The 2-year relapse rate was more favorable in patients undergoing cord-haplo HSCT than in those undergoing haploidentical HSCT (25.9% versus 53.2%; P = .007), as was progression-free survival (PFS; 35.5% versus 17.9%; P = .049). Meanwhile, nonrelapse mortality at 2 years was not significantly different (38.0% versus 24.6%; P = .367). We also found that a higher number of mutual haploidentical donor-mismatched antigens, a concept similar to HLA mismatching, was associated with better disease control. Multivariate analysis identified cord-haplo HSCT as an independent significant predictor of reduced relapse (hazard ratio [HR], .44; P = .028) and improved PFS (HR, .58; P = .033), as was chronic graft-versus-host disease (GVHD) (relapse: HR, .42; P = .013; PFS: HR, .63: P = .052). However, the incidences of neutrophil and platelet engraftment, GVHD, and virus reactivation were comparable in the 2 groups. This study demonstrates that cord-haplo HSCT significantly enhances the GVL effect and improves PFS, providing a reliable and efficient therapeutic platform for patients with refractory acute leukemia.
Assuntos
Transplante de Células-Tronco de Sangue do Cordão Umbilical , Leucemia , Depleção Linfocítica , Doença Aguda , Adolescente , Adulto , Aloenxertos , Criança , Pré-Escolar , Doença Crônica , Intervalo Livre de Doença , Feminino , Doença Enxerto-Hospedeiro/metabolismo , Doença Enxerto-Hospedeiro/mortalidade , Doença Enxerto-Hospedeiro/prevenção & controle , Antígenos HLA/metabolismo , Teste de Histocompatibilidade , Humanos , Leucemia/metabolismo , Leucemia/mortalidade , Leucemia/terapia , Masculino , Pessoa de Meia-Idade , Recidiva , Estudos Retrospectivos , Taxa de Sobrevida , Fatores de TempoRESUMO
Motivation: As a fundamental task in bioinformatics, searching for massive short patterns over a long text has been accelerated by various compressed full-text indexes. These indexes are able to provide similar searching functionalities to classical indexes, e.g. suffix trees and suffix arrays, while requiring less space. For genomic data, a well-known family of compressed full-text indexes, called FM-indexes, presents unmatched performance in practice. One major drawback of FM-indexes is that their locating operations, which report all occurrence positions of patterns in a given text, are not efficient, especially for the patterns with many occurrences. Results: In this paper, we introduce a novel locating algorithm, FMtree, to fast retrieve all occurrence positions of any pattern via FM-indexes. When searching for a pattern over a given text, FMtree organizes the search space of the locating operation into a conceptual multiway tree. As a result, multiple occurrence positions of this pattern can be retrieved simultaneously by traversing the multiway tree. Compared with existing locating algorithms, our tree-based algorithm reduces large numbers of redundant operations and presents better data locality. Experimental results show that FMtree is usually one order of magnitude faster than the state-of-the-art algorithms, and still memory-efficient. Availability and implementation: FMtree is freely available at https://github.com/chhylp123/FMtree. Contact: xuyun@ustc.edu.cn. Supplementary information: Supplementary data are available at Bioinformatics online.
Assuntos
Genoma , Análise de Sequência de DNA/métodos , Software , Algoritmos , Animais , Genômica/métodos , Humanos , CamundongosRESUMO
BACKGROUND: As the next-generation sequencing (NGS) technologies producing hundreds of millions of reads every day, a tremendous computational challenge is to map NGS reads to a given reference genome efficiently. However, existing methods of all-mappers, which aim at finding all mapping locations of each read, are very time consuming. The majority of existing all-mappers consist of 2 main parts, filtration and verification. This work significantly reduces verification time, which is the dominant part of the running time. RESULTS: An efficient all-mapper, BitMapper, is developed based on a new vectorized bit-vector algorithm, which simultaneously calculates the edit distance of one read to multiple locations in a given reference genome. Experimental results on both simulated and real data sets show that BitMapper is from several times to an order of magnitude faster than the current state-of-the-art all-mappers, while achieving higher sensitivity, i.e., better quality solutions. CONCLUSIONS: We present BitMapper, which is designed to return all mapping locations of raw reads containing indels as well as mismatches. BitMapper is implemented in C under a GPL license. Binaries are freely available at http://home.ustc.edu.cn/%7Echhy.
Assuntos
Algoritmos , Metodologias Computacionais , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Animais , Arabidopsis/genética , Caenorhabditis elegans/genética , Humanos , Mutação INDELRESUMO
Diverse sets of complete human genomes are required to construct a pangenome reference and to understand the extent of complex structural variation. Here, we sequence 65 diverse human genomes and build 130 haplotype-resolved assemblies (130 Mbp median continuity), closing 92% of all previous assembly gaps1,2 and reaching telomere-to-telomere (T2T) status for 39% of the chromosomes. We highlight complete sequence continuity of complex loci, including the major histocompatibility complex (MHC), SMN1/SMN2, NBPF8, and AMY1/AMY2, and fully resolve 1,852 complex structural variants (SVs). In addition, we completely assemble and validate 1,246 human centromeres. We find up to 30-fold variation in α-satellite high-order repeat (HOR) array length and characterize the pattern of mobile element insertions into α-satellite HOR arrays. While most centromeres predict a single site of kinetochore attachment, epigenetic analysis suggests the presence of two hypomethylated regions for 7% of centromeres. Combining our data with the draft pangenome reference1 significantly enhances genotyping accuracy from short-read data, enabling whole-genome inference3 to a median quality value (QV) of 45. Using this approach, 26,115 SVs per sample are detected, substantially increasing the number of SVs now amenable to downstream disease association studies.
RESUMO
Satellite DNA are long tandemly repeating sequences in a genome and may be organized as high-order repeats (HORs). They are enriched in centromeres and are challenging to assemble. Existing algorithms for identifying satellite repeats either require the complete assembly of satellites or only work for simple repeat structures without HORs. Here we describe Satellite Repeat Finder (SRF), a new algorithm for reconstructing satellite repeat units and HORs from accurate reads or assemblies without prior knowledge on repeat structures. Applying SRF to real sequence data, we showed that SRF could reconstruct known satellites in human and well-studied model organisms. We also found satellite repeats are pervasive in various other species, accounting for up to 12% of their genome contents but are often underrepresented in assemblies. With the rapid progress on genome sequencing, SRF will help the annotation of new genomes and the study of satellite DNA evolution even if such repeats are not fully assembled.
RESUMO
Despite recent advances in the length and the accuracy of long-read data, building haplotype-resolved genome assemblies from telomere to telomere still requires considerable computational resources. In this study, we present an efficient de novo assembly algorithm that combines multiple sequencing technologies to scale up population-wide telomere-to-telomere assemblies. By utilizing twenty-two human and two plant genomes, we demonstrate that our algorithm is around an order of magnitude cheaper than existing methods, while producing better diploid and haploid assemblies. Notably, our algorithm is the only feasible solution to the haplotype-resolved assembly of polyploid genomes.
RESUMO
In this study, a novel nonfragile deep reinforcement learning (DRL) method was proposed to realize the finite-time control of switched unmanned flight vehicles. Control accuracy, robustness, and intelligence were enhanced in the proposed control scheme by combining conventional robust control and DRL characteristics. In the proposed control strategy, the tracking controller consists of a dynamics-based controller and a learning-based controller. The conventional robust control approach for the nominal system was used for realizing a dynamics-based baseline tracking controller. The learning-based controller based on DRL was developed to compensate model uncertainties and enhance transient control accuracy. The multiple Lyapunov function approach and mode-dependent average dwell time approach were combined to analyze the finite-time stability of flight vehicles with asynchronous switching. The linear matrix inequalities technique was used to determine the solutions of dynamics-based controllers. Online optimization was formulated as a Markov decision process. The adaptive deep deterministic policy gradient algorithm was adopted to improve efficiency and convergence. In this algorithm, the actor-critic structure was used and adaptive hyperparameters were introduced. Unlike the conventional DRL algorithm, nonfragile control theory and adaptive reward function were used in the proposed algorithm to achieve excellent stability and training efficiency. We demonstrated the effectiveness of the presented algorithm through comparative simulations.
RESUMO
The large molecular weight of polysaccharides limits their absorption and utilization by organisms, affecting their biological activities. In this study, we purified α-1,6-galactan from Cantharellus cibarius Fr. (chanterelle) and reduced its molecular weight from approximately 20 kDa to 5 kDa (named CCP) to increase its solubility and absorption. In APP/PS1 mice, CCP improved both spatial and non-spatial memory loss in Alzheimer's disease (AD) mice, as confirmed by the Morris water maze, step-down, step-through, and novel object recognition tests, and dampened the deposition of amyloid-ß plaques, as assessed by immunohistochemical analysis. Proteomic analysis suggested that the neuroprotective effects of CCP are related to anti-neuroinflammation. Immunofluorescence analysis and western blotting confirmed that CCP attenuated AD-like symptoms partly by inhibiting neuroinflammation, which was related to the blocking of complement component 3. Our study provides theoretical support and experimental evidence for the future application of chanterelle-extracted polysaccharides in AD treatment, promoting the modern development of traditional medicines originating from natural polysaccharides.
Assuntos
Doença de Alzheimer , Camundongos , Animais , Doença de Alzheimer/tratamento farmacológico , Galactanos , Neuroproteção , Peso Molecular , Proteômica , Peptídeos beta-Amiloides , Polissacarídeos/farmacologia , Polissacarídeos/uso terapêutico , Polissacarídeos/química , Modelos Animais de DoençasRESUMO
InSe layered semiconductors with high mobility have advantages over transition-metal dichalcogenides in certain device applications. Understanding the dynamics of carriers, especially around the major bandgaps, is not only of fundamental interest but also important for improving the performance of devices. We investigated ultrafast carrier dynamics in exfoliated InSe near the bandgap and found that the presence of photocarriers led to shrinkage in the optical bandgap. In addition, we observed that the carrier recombination rate increased when the thickness of the InSe nanoflakes was reduced and the process was dominated by surface recombination. For the same flakes, the recombination rate became lower after the freshly exfoliated InSe was exposed to air and oxidized. Using a free carrier diffusion model, layer-dependent surface recombination velocities were obtained. Our investigation reveals that the surface condition and the thickness of few-layer InSe play important roles in carrier lifetimes.
RESUMO
Mitochondrial Hsp60 (mtHsp60) plays a crucial role in maintaining the proper folding of proteins in the mitochondria. mtHsp60 self-assembles into a ring-shaped heptamer, which can further form a double-ring tetradecamer in the presence of ATP and mtHsp10. However, mtHsp60 tends to dissociate in vitro, unlike its prokaryotic homologue, GroEL. The molecular structure of dissociated mtHsp60 and the mechanism behind its dissociation remain unclear. In this study, we demonstrated that Epinephelus coioides mtHsp60 (EcHsp60) can form a dimeric structure with inactive ATPase activity. The crystal structure of this dimer reveals symmetrical subunit interactions and a rearranged equatorial domain. The α4 helix of each subunit extends and interacts with its adjacent subunit, leading to the disruption of the ATP-binding pocket. Furthermore, an RLK motif in the apical domain contributes to stabilizing the dimeric complex. These structural and biochemical findings provide new insights into the conformational transitions and functional regulation of this ancient chaperonin.
Assuntos
Chaperoninas , Escherichia coli , Escherichia coli/metabolismo , Chaperoninas/química , Chaperoninas/metabolismo , Trifosfato de Adenosina/metabolismo , Mitocôndrias/metabolismoRESUMO
MicroRNA-155 (miR155) is overexpressed in various inflammatory diseases and cancer, in which bone resorption and osteolysis are frequently observed. However, the role of miR155 on osteogenesis and bone mass phenotype is still unknown. Here, we report a low bone mass phenotype in the long bone of Mir155-Tg mice compared with wild-type mice. In contrast, Mir155-KO mice showed a high bone mass phenotype and protective effect against inflammation-induced bone loss. Mir155-KO mice showed robust bone regeneration in the ectopic and orthotopic model, but Mir155-Tg mice showed compromised bone regeneration compared with the wild-type mice. Similarly, the osteogenic differentiation potential of bone marrow stromal stem cells (BMSCs) from Mir155-KO mice was robust and Mir155-Tg was compromised compared with that of wild-type mice. Moreover, Mir155 knockdown in BMSCs from wild-type mice showed higher osteogenic differentiation potential, supporting the results from Mir155-KO mice. TargetScan analysis predicted sphingosine 1-phosphate receptor-1 (S1pr1) as a target gene of Mir155, which was further confirmed by luciferase assay and Mir155 knockdown. S1pr1 overexpression in BMSCs robustly promoted osteogenic differentiation without affecting cell viability and proliferation. Furthermore, osteoclastogenic differentiation of Mir155-Tg bone marrow-derived macrophages was inhibited compared with that of wild-type mice. Thus, Mir155 showed a catabolic effect on osteogenesis and bone mass phenotype via interaction with the S1pr1 gene, suggesting inhibition of Mir155 as a potential strategy for bone regeneration and bone defect healing.
Assuntos
MicroRNAs , Osteogênese , Camundongos , Animais , Osso e Ossos/metabolismo , Densidade Óssea , Diferenciação Celular , Células da Medula Óssea/metabolismo , Células Cultivadas , MicroRNAs/genética , MicroRNAs/metabolismoRESUMO
Indium selenide (InSe) is an emerging van der Waals material, which exhibits the potential to serve in excellent electronic and optoelectronic devices. One of the advantages of layered materials is their application to flexible devices. How strain alters the electronic and optical properties is, thus, an important issue. In this work, we experimentally measured the strain dependence on the angle-resolved second harmonic generation (SHG) pattern of a few layers of InSe. We used the exfoliation method to fabricate InSe flakes and measured the SHG images of the flakes with different azimuthal angles. We found the SHG intensity of InSe decreased, while the compressive strain increased. Through first-principles electronic structure calculations, we investigated the strain dependence on SHG susceptibilities and the corresponding angle-resolved SHG pattern. The experimental data could be fitted well by the calculated results using only a fitting parameter. The demonstrated method based on first-principles in this work can be used to quantitatively model the strain-induced angle-resolved SHG patterns in 2D materials. Our obtained results are very useful for the exploration of the physical properties of flexible devices based on 2D materials.