Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 83
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
aBIOTECH ; 5(3): 298-308, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39279850

RESUMO

MicroRNAs (miRNAs) and short RNA fragments (18-25 nt) are crucial biomarkers in biological research and disease diagnostics. However, their accurate and rapid detection remains a challenge, largely due to their low abundance, short length, and sequence similarities. In this study, we report on a highly sensitive, one-step RNA O-circle amplification (ROA) assay for rapid and accurate miRNA detection. The ROA assay commences with the hybridization of a circular probe with the test RNA, followed by a linear rolling circle amplification (RCA) using dUTP. This amplification process is facilitated by U-nick reactions, which lead to an exponential amplification for readout. Under optimized conditions, assays can be completed within an hour, producing an amplification yield up to the microgram level, with a detection limit as low as 0.15 fmol (6 pM). Notably, the ROA assay requires only one step, and the results can be easily read visually, making it user-friendly. This ROA assay has proven effective in detecting various miRNAs and phage ssRNA. Overall, the ROA assay offers a user-friendly, rapid, and accurate solution for miRNA detection. Supplementary Information: The online version contains supplementary material available at 10.1007/s42994-024-00140-0.

2.
Trends Plant Sci ; 2024 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-39232945

RESUMO

Plant pathogens usually secrete effectors to suppress the host immune response, resulting in effector-triggered susceptibility (ETS). Plants use nucleotide-binding leucine-rich repeat receptors (NLRs) to detect specific effectors and elicit effector-triggered immunity (ETI). Two recent papers (Liu et al. and Zhang et al.) have made promising progress in controlling rice blast by modulating ETS and ETI.

3.
Nat Genet ; 56(9): 1975-1984, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39138385

RESUMO

Cultivated peanut (Arachis hypogaea L.) is a widely grown oilseed crop worldwide; however, the events leading to its origin and diversification are not fully understood. Here by combining chloroplast and whole-genome sequence data from a large germplasm collection, we show that the two subspecies of A. hypogaea (hypogaea and fastigiata) likely arose from distinct allopolyploidization and domestication events. Peanut genetic clusters were then differentiated in relation to dissemination routes and breeding efforts. A combination of linkage mapping and genome-wide association studies allowed us to characterize genes and genomic regions related to main peanut morpho-agronomic traits, namely flowering pattern, inner tegument color, growth habit, pod/seed weight and oil content. Together, our findings shed light on the evolutionary history and phenotypic diversification of peanuts and might be of broad interest to plant breeders.


Assuntos
Arachis , Cloroplastos , Evolução Molecular , Genoma de Planta , Estudo de Associação Genômica Ampla , Fenótipo , Sequenciamento Completo do Genoma , Arachis/genética , Cloroplastos/genética , Mapeamento Cromossômico , Filogenia , Domesticação , Melhoramento Vegetal/métodos
4.
Artigo em Inglês | MEDLINE | ID: mdl-39209796

RESUMO

Increasing the accuracy of the nucleotide sequence alignment is an essential issue in genomics research. Although classic dynamic programming (DP) algorithms (e.g., Smith-Waterman and Needleman-Wunsch) guarantee to produce the optimal result, their time complexity hinders the application of large-scale sequence alignment. Many optimization efforts that aim to accelerate the alignment process generally come from three perspectives: redesigning data structures [e.g., diagonal or striped Single Instruction Multiple Data (SIMD) implementations], increasing the number of parallelisms in SIMD operations (e.g., difference recurrence relation), or reducing search space (e.g., banded DP). However, no methods combine all these three aspects to build an ultra-fast algorithm. In this study, we developed a Banded Striped Aligner (BSAlign) library that delivers accurate alignment results at an ultra-fast speed by knitting a series of novel methods together to take advantage of all of the aforementioned three perspectives with highlights such as active F-loop in striped vectorization and striped move in banded DP. We applied our new acceleration design on both regular and edit distance pairwise alignment. BSAlign achieved 2-fold speed-up than other SIMD-based implementations for regular pairwise alignment, and 1.5-fold to 4-fold speed-up in edit distance-based implementations for long reads. BSAlign is implemented in C programing language and is available at https://github.com/ruanjue/bsalign.


Assuntos
Algoritmos , Alinhamento de Sequência , Software , Alinhamento de Sequência/métodos , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de DNA/métodos , Biblioteca Gênica , Biologia Computacional/métodos , Sequência de Bases/genética
5.
Nat Commun ; 15(1): 5573, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38956036

RESUMO

Recent advancements in genome assembly have greatly improved the prospects for comprehensive annotation of Transposable Elements (TEs). However, existing methods for TE annotation using genome assemblies suffer from limited accuracy and robustness, requiring extensive manual editing. In addition, the currently available gold-standard TE databases are not comprehensive, even for extensively studied species, highlighting the critical need for an automated TE detection method to supplement existing repositories. In this study, we introduce HiTE, a fast and accurate dynamic boundary adjustment approach designed to detect full-length TEs. The experimental results demonstrate that HiTE outperforms RepeatModeler2, the state-of-the-art tool, across various species. Furthermore, HiTE has identified numerous novel transposons with well-defined structures containing protein-coding domains, some of which are directly inserted within crucial genes, leading to direct alterations in gene expression. A Nextflow version of HiTE is also available, with enhanced parallelism, reproducibility, and portability.


Assuntos
Elementos de DNA Transponíveis , Anotação de Sequência Molecular , Elementos de DNA Transponíveis/genética , Anotação de Sequência Molecular/métodos , Animais , Software , Humanos , Reprodutibilidade dos Testes , Biologia Computacional/métodos , Bases de Dados Genéticas , Algoritmos , Genoma/genética
6.
Nat Commun ; 15(1): 5644, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38969648

RESUMO

Long-read sequencing, exemplified by PacBio, revolutionizes genomics, overcoming challenges like repetitive sequences. However, the high DNA requirement ( > 1 µg) is prohibitive for small organisms. We develop a low-input (100 ng), low-cost, and amplification-free library-generation method for PacBio sequencing (LILAP) using Tn5-based tagmentation and DNA circularization within one tube. We test LILAP with two Drosophila melanogaster individuals, and generate near-complete genomes, surpassing preexisting single-fly genomes. By analyzing variations in these two genomes, we characterize mutational processes: complex transpositions (transposon insertions together with extra duplications and/or deletions) prefer regions characterized by non-B DNA structures, and gene conversion of transposons occurs on both DNA and RNA levels. Concurrently, we generate two complete assemblies for the endosymbiotic bacterium Wolbachia in these flies and similarly detect transposon conversion. Thus, LILAP promises a broad PacBio sequencing adoption for not only mutational studies of flies and their symbionts but also explorations of other small organisms or precious samples.


Assuntos
Elementos de DNA Transponíveis , Drosophila melanogaster , Genoma de Inseto , Mutação , Wolbachia , Animais , Drosophila melanogaster/genética , Elementos de DNA Transponíveis/genética , Wolbachia/genética , Genoma de Inseto/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Genômica/métodos , Conversão Gênica
7.
Adv Sci (Weinh) ; 11(30): e2402951, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38874370

RESUMO

Composite DNA letters, by merging all four DNA nucleotides in specified ratios, offer a pathway to substantially increase the logical density of DNA digital storage (DDS) systems. However, these letters are susceptible to nucleotide errors and sampling bias, leading to a high letter error rate, which complicates precise data retrieval and augments reading expenses. To address this, Derrick-cp is introduced as an innovative soft-decision decoding algorithm tailored for DDS utilizing composite letters. Derrick-cp capitalizes on the distinctive error sensitivities among letters to accurately predict and rectify letter errors, thus enhancing the error-correcting performance of Reed-Solomon codes beyond traditional hard-decision decoding limits. Through comparative analyses in the existing dataset and simulated experiments, Derrick-cp's superiority is validated, notably halving the sequencing depth requirement and slashing costs by up to 22% against conventional hard-decision strategies. This advancement signals Derrick-cp's significant role in elevating both the precision and cost-efficiency of composite letter-based DDS.


Assuntos
Algoritmos , DNA , DNA/genética , Armazenamento e Recuperação da Informação/métodos
8.
Nat Commun ; 15(1): 3126, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38605047

RESUMO

Long reads that cover more variants per read raise opportunities for accurate haplotype construction, whereas the genotype errors of single nucleotide polymorphisms pose great computational challenges for haplotyping tools. Here we introduce KSNP, an efficient haplotype construction tool based on the de Bruijn graph (DBG). KSNP leverages the ability of DBG in handling high-throughput erroneous reads to tackle the challenges. Compared to other notable tools in this field, KSNP achieves at least 5-fold speedup while producing comparable haplotype results. The time required for assembling human haplotypes is reduced to nearly the data-in time.


Assuntos
Algoritmos , Polimorfismo de Nucleotídeo Único , Humanos , Haplótipos/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software
9.
Gigascience ; 132024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38626722

RESUMO

BACKGROUND: Most currently available reference genomes lack the sequence map of sex-limited (such as Y and W) chromosomes, which results in incomplete assemblies that hinder further research on sex chromosomes. Recent advancements in long-read sequencing and population sequencing have provided the opportunity to assemble sex-limited chromosomes without the traditional complicated experimental efforts. FINDINGS: We introduce the first computational method, Sorting long Reads of Y or other sex-limited chromosome (SRY), which achieves improved assembly results compared to flow sorting. Specifically, SRY outperforms in the heterochromatic region and demonstrates comparable performance in other regions. Furthermore, SRY enhances the capabilities of the hybrid assembly software, resulting in improved continuity and accuracy. CONCLUSIONS: Our method enables true complete genome assembly and facilitates downstream research of sex-limited chromosomes.


Assuntos
Genoma , Cromossomos Sexuais , Cromossomos Sexuais/genética , Análise de Sequência de DNA/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
10.
Genome Biol ; 25(1): 107, 2024 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-38671502

RESUMO

Long-read sequencing data, particularly those derived from the Oxford Nanopore sequencing platform, tend to exhibit high error rates. Here, we present NextDenovo, an efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. We apply NextDenovo to assemble 35 diverse human genomes from around the world using Nanopore long-read data. These genomes allow us to identify the landscape of segmental duplication and gene copy number variation in modern human populations. The use of NextDenovo should pave the way for population-scale long-read assembly using Nanopore long-read data.


Assuntos
Variações do Número de Cópias de DNA , Genoma Humano , Humanos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Sequenciamento por Nanoporos/métodos , Análise de Sequência de DNA/métodos , Genômica/métodos
11.
Bioinformatics ; 40(3)2024 03 04.
Artigo em Inglês | MEDLINE | ID: mdl-38377404

RESUMO

MOTIVATION: Seeding is a rate-limiting stage in sequence alignment for next-generation sequencing reads. The existing optimization algorithms typically utilize hardware and machine-learning techniques to accelerate seeding. However, an efficient solution provided by professional next-generation sequencing compressors has been largely overlooked by far. In addition to achieving remarkable compression ratios by reordering reads, these compressors provide valuable insights for downstream alignment that reveal the repetitive computations accounting for more than 50% of seeding procedure in commonly used short read aligner BWA-MEM at typical sequencing coverage. Nevertheless, the exploited redundancy information is not fully realized or utilized. RESULTS: In this study, we present a compressive seeding algorithm, named CompSeed, to fill the gap. CompSeed, in collaboration with the existing reordering-based compression tools, finishes the BWA-MEM seeding process in about half the time by caching all intermediate seeding results in compact trie structures to directly answer repetitive inquiries that frequently cause random memory accesses. Furthermore, CompSeed demonstrates better performance as sequencing coverage increases, as it focuses solely on the small informative portion of sequencing reads after compression. The innovative strategy highlights the promising potential of integrating sequence compression and alignment to tackle the ever-growing volume of sequencing data. AVAILABILITY AND IMPLEMENTATION: CompSeed is available at https://github.com/i-xiaohu/CompSeed.


Assuntos
Compressão de Dados , Software , Análise de Sequência de DNA/métodos , Algoritmos , Compressão de Dados/métodos , Computadores , Sequenciamento de Nucleotídeos em Larga Escala/métodos
12.
BMC Genomics ; 25(1): 197, 2024 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-38373887

RESUMO

BACKGROUND: In cold and temperate zones, seasonal reproduction plays a crucial role in the survival and reproductive success of species. The photoperiod influences reproductive processes in seasonal breeders through the hypothalamic-pituitary-gonadal (HPG) axis, in which the mediobasal hypothalamus (MBH) serves as the central region responsible for transmitting light information to the endocrine system. However, the cis-regulatory elements and the transcriptional activation mechanisms related to seasonal activation of the reproductive axis in MBH remain largely unclear. In this study, an artificial photoperiod program was used to induce the HPG axis activation in male quails, and we compared changes in chromatin accessibility changes during the seasonal activation of the HPG axis. RESULTS: Alterations in chromatin accessibility occurred in the mediobasal hypothalamus (MBH) and stabilized at LD7 during the activation of the HPG axis. Most open chromatin regions (OCRs) are enriched mainly in introns and distal intergenic regions. The differentially accessible regions (DARs) showed enrichment of binding motifs of the RFX, NKX, and MEF family of transcription factors that gained-loss accessibility under long-day conditions, while the binding motifs of the nuclear receptor (NR) superfamily and BZIP family gained-open accessibility. Retinoic acid signaling and GTPase-mediated signal transduction are involved in adaptation to long days and maintenance of the HPG axis activation. According to our footprint analysis, three clock-output genes (TEF, DBP, and HLF) and the THRA were the first responders to long days in LD3. THRB, NR3C2, AR, and NR3C1 are the key players associated with the initiation and maintenance of the activation of the HPG axis, which appeared at LD7 and tended to be stable under long-day conditions. By integrating chromatin and the transcriptome, three genes (DIO2, SLC16A2, and PDE6H) involved in thyroid hormone signaling showed differential chromatin accessibility and expression levels during the seasonal activation of the HPG axis. TRPA1, a target of THRB identified by DAP-seq, was sensitive to photoactivation and exhibited differential expression levels between short- and long-day conditions. CONCLUSION: Our data suggest that trans effects were the main factors affecting gene expression during the seasonal activation of the HPG axis. This study could lead to further research on the seasonal reproductive behavior of birds, particularly the role of MBH in controlling seasonal reproductive behavior.


Assuntos
Cromatina , Codorniz , Animais , Masculino , Estações do Ano , Codorniz/genética , Cromatina/genética , Cromatina/metabolismo , Hipotálamo/metabolismo , Reprodução/genética , Fotoperíodo
13.
Natl Sci Rev ; 11(2): nwad229, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38213525

RESUMO

Error-correcting codes (ECCs) employed in the state-of-the-art DNA digital storage (DDS) systems suffer from a trade-off between error-correcting capability and the proportion of redundancy. To address this issue, in this study, we introduce soft-decision decoding approach into DDS by proposing a DNA-specific error prediction model and a series of novel strategies. We demonstrate the effectiveness of our approach through a proof-of-concept DDS system based on Reed-Solomon (RS) code, named as Derrick. Derrick shows significant improvement in error-correcting capability without involving additional redundancy in both in vitro and in silico experiments, using various sequencing technologies such as Illumina, PacBio and Oxford Nanopore Technology (ONT). Notably, in vitro experiments using ONT sequencing at a depth of 7× reveal that Derrick, compared with the traditional hard-decision decoding strategy, doubles the error-correcting capability of RS code, decreases the proportion of matrices with decoding-failure by 229-fold, and amplifies the potential maximum storage volume by impressive 32 388-fold. Also, Derrick surpasses 'state-of-the-art' DDS systems by comprehensively considering the information density and the minimum sequencing depth required for complete information recovery. Crucially, the soft-decision decoding strategy and key steps of Derrick are generalizable to other ECCs' decoding algorithms.

14.
Genome Biol ; 24(1): 277, 2023 Dec 04.
Artigo em Inglês | MEDLINE | ID: mdl-38049885

RESUMO

BACKGROUND: Recent state-of-the-art sequencing technologies enable the investigation of challenging regions in the human genome and expand the scope of variant benchmarking datasets. Herein, we sequence a Chinese Quartet, comprising two monozygotic twin daughters and their biological parents, using four short and long sequencing platforms (Illumina, BGI, PacBio, and Oxford Nanopore Technology). RESULTS: The long reads from the monozygotic twin daughters are phased into paternal and maternal haplotypes using the parent-child genetic map and for each haplotype. We also use long reads to generate haplotype-resolved whole-genome assemblies with completeness and continuity exceeding that of GRCh38. Using this Quartet, we comprehensively catalogue the human variant landscape, generating a dataset of 3,962,453 SNVs, 886,648 indels (< 50 bp), 9726 large deletions (≥ 50 bp), 15,600 large insertions (≥ 50 bp), 40 inversions, 31 complex structural variants, and 68 de novo mutations which are shared between the monozygotic twin daughters. Variants underrepresented in previous benchmarks owing to their complexity-including those located at long repeat regions, complex structural variants, and de novo mutations-are systematically examined in this study. CONCLUSIONS: In summary, this study provides high-quality haplotype-resolved assemblies and a comprehensive set of benchmarking resources for two Chinese monozygotic twin samples which, relative to existing benchmarks, offers expanded genomic coverage and insight into complex variant categories.


Assuntos
Benchmarking , População do Leste Asiático , Gêmeos Monozigóticos , Humanos , População do Leste Asiático/genética , Genômica , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Gêmeos Monozigóticos/genética , Estudos em Gêmeos como Assunto
15.
Proc Natl Acad Sci U S A ; 120(42): e2305208120, 2023 10 17.
Artigo em Inglês | MEDLINE | ID: mdl-37816049

RESUMO

Polyploidization is important to the evolution of plants. Subgenome dominance is a distinct phenomenon associated with most allopolyploids. A gene on the dominant subgenome tends to express to higher RNA levels in all organs as compared to the expression of its syntenic paralogue (homoeolog). The mechanism that underlies the formation of subgenome dominance remains unknown, but there is evidence for the involvement of transposon/DNA methylation density differences nearby the genes of parents as being causal. The subgenome with lower density of transposon and methylation near genes is positively associated with subgenome dominance. Here, we generated eight generations of allotetraploid progenies from the merging of parental genomes Brassica rapa and Brassica oleracea. We found that transposon/methylation density differ near genes between the parental (rapa:oleracea) existed in the wide hybrid, persisted in the neotetraploids (the synthetic Brassica napus), but these neotetraploids expressed no expected subgenome dominance. This absence of B. rapa vs. B. oleracea subgenome dominance is particularly significant because, while there is no negative relationship between transposon/methylation level and subgenome dominance in the neotetraploids, the more ancient parental subgenomes for all Brassica did show differences in transposon/methylation densities near genes and did express, in the same samples of cells, biased gene expression diagnostic of subgenome dominance. We conclude that subgenome differences in methylated transposon near genes are not sufficient to initiate the biased gene expressions defining subgenome dominance. Our result was unexpected, and we suggest a "nuclear chimera" model to explain our data.


Assuntos
Brassica napus , Brassica rapa , Brassica , Brassica/genética , Genoma de Planta/genética , Brassica rapa/genética , Brassica napus/genética , Metilação de DNA/genética , Poliploidia
16.
Nucleic Acids Res ; 51(20): 10924-10933, 2023 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-37843097

RESUMO

Detailed knowledge of the genetic variations in diverse crop populations forms the basis for genetic crop improvement and gene functional studies. In the present study, we analyzed a large rice population with a total of 10 548 accessions to construct a rice super-population variation map (RSPVM), consisting of 54 378 986 single nucleotide polymorphisms, 11 119 947 insertion/deletion mutations and 184 736 presence/absence variations. Assessment of variation detection efficiency for different population sizes revealed a sharp increase of all types of variation as the population size increased and a gradual saturation of that after the population size reached 10 000. Variant frequency analysis indicated that ∼90% of the obtained variants were rare, and would therefore likely be difficult to detect in a relatively small population. Among the rare variants, only 2.7% were predicted to be deleterious. Population structure, genetic diversity and gene functional polymorphism of this large population were evaluated based on different subsets of RSPVM, demonstrating the great potential of RSPVM for use in downstream applications. Our study provides both a rich genetic basis for understanding natural rice variations and a powerful tool for exploiting great potential of rare variants in future rice research, including population genetics and functional genomics.


Assuntos
Variação Genética , Oryza , Genética Populacional , Genômica , Oryza/genética , Polimorfismo de Nucleotídeo Único
17.
J Integr Plant Biol ; 65(10): 2320-2335, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37688324

RESUMO

Diterpenoid alkaloids (DAs) have been often utilized in clinical practice due to their analgesic and anti-inflammatory properties. Natural DAs are prevalent in the family Ranunculaceae, notably in the Aconitum genus. Nevertheless, the evolutionary origin of the biosynthesis pathway responsible for DA production remains unknown. In this study, we successfully assembled a high-quality, pseudochromosome-level genome of the DA-rich species Aconitum vilmorinianum (A. vilmorinianum) (5.76 Gb). An A. vilmorinianum-specific whole-genome duplication event was discovered using comparative genomic analysis, which may aid in the evolution of the DA biosynthesis pathway. We identified several genes involved in DA biosynthesis via integrated genomic, transcriptomic, and metabolomic analyses. These genes included enzymes encoding target ent-kaurene oxidases and aminotransferases, which facilitated the activation of diterpenes and insertion of nitrogen atoms into diterpene skeletons, thereby mediating the transformation of diterpenes into DAs. The divergence periods of these genes in A. vilmorinianum were further assessed, and it was shown that two major types of genes were involved in the establishment of the DA biosynthesis pathway. Our integrated analysis offers fresh insights into the evolutionary origin of DAs in A. vilmorinianum as well as suggestions for engineering the biosynthetic pathways to obtain desired DAs.


Assuntos
Aconitum , Alcaloides , Diterpenos , Aconitum/genética , Aconitum/metabolismo , Multiômica , Diterpenos/metabolismo , Alcaloides/metabolismo , Transcriptoma/genética , Raízes de Plantas
19.
Cell Res ; 33(10): 745-761, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37452091

RESUMO

Since the release of the complete human genome, the priority of human genomic study has now been shifting towards closing gaps in ethnic diversity. Here, we present a fully phased and well-annotated diploid human genome from a Han Chinese male individual (CN1), in which the assemblies of both haploids achieve the telomere-to-telomere (T2T) level. Comparison of this diploid genome with the CHM13 haploid T2T genome revealed significant variations in the centromere. Outside the centromere, we discovered 11,413 structural variations, including numerous novel ones. We also detected thousands of CN1 alleles that have accumulated high substitution rates and a few that have been under positive selection in the East Asian population. Further, we found that CN1 outperforms CHM13 as a reference genome in mapping and variant calling for the East Asian population owing to the distinct structural variants of the two references. Comparison of SNP calling for a large cohort of 8869 Chinese genomes using CN1 and CHM13 as reference respectively showed that the reference bias profoundly impacts rare SNP calling, with nearly 2 million rare SNPs miss-called with different reference genomes. Finally, applying the CN1 as a reference, we discovered 5.80 Mb and 4.21 Mb putative introgression sequences from Neanderthal and Denisovan, respectively, including many East Asian specific ones undetected using CHM13 as the reference. Our analyses reveal the advances of using CN1 as a reference for population genomic studies and paleo-genomic studies. This complete genome will serve as an alternative reference for future genomic studies on the East Asian population.


Assuntos
Diploide , População do Leste Asiático , Genoma Humano , Telômero , Humanos , Masculino , Povo Asiático/genética , População do Leste Asiático/etnologia , População do Leste Asiático/genética , Genoma Humano/genética , Genômica , Telômero/genética
20.
J Genet Genomics ; 2023 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-37245652

RESUMO

With the rapid development of sequencing technologies, especially the maturity of third-generation sequencing technologies, there has been a significant increase in the number and quality of published genome assemblies. The emergence of these high-quality genomes has raised higher requirements for genome evaluation. Although numerous computational methods have been developed to evaluate assembly quality from various perspectives, the selective use of these evaluation methods can be arbitrary and inconvenient for fairly comparing the assembly quality. To address this issue, we have developed the Genome Assembly Evaluating Pipeline (GAEP), which provides a comprehensive assessment pipeline for evaluating genome quality from multiple perspectives, including continuity, completeness, and correctness. Additionally, GAEP includes new functions for detecting misassemblies and evaluating the assembly redundancy, which performs well in our testing. GAEP is publicly available at https://github.com/zy-optimistic/GAEP under the GPL3.0 License. With GAEP, users can quickly obtain accurate and reliable evaluation results, facilitating the comparison and selection of high-quality genome assemblies.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA