Pesquisa | Secretaria de Estado da Saúde

1.

Hybrid assembly of ultra-long Nanopore reads augmented with 10x-Genomics contigs: Demonstrated with a human genome.

Ma, Zhanshan Sam; Li, Lianwei; Ye, Chengxi; Peng, Minsheng; Zhang, Ya-Ping.

Genomics ; 111(6): 1896-1901, 2019 12.

Artigo em Inglês | MEDLINE | ID: mdl-30594583

RESUMO

The 3rd generation of sequencing (3GS) technologies generate ultra-long reads (up to 1â¯Mb), which makes it possible to eliminate gaps and effectively resolve repeats in genome assembly. However, the 3GS technologies suffer from the high base-level error rates (15%-40%) and high sequencing costs. To address these issues, the hybrid assembly strategy, which utilizes both 3GS reads and inexpensive NGS (next generation sequencing) short reads, was invented. Here, we use 10×-Genomics® technology, which integrates a novel bar-coding strategy with Illumina® NGS with an advantage of revealing long-range sequence information, to replace common NGS short reads for hybrid assembly of long erroneous 3GS reads. We demonstrate the feasibility of integrating the 3GS with 10×-Genomics technologies for a new strategy of hybrid de novo genome assembly by utilizing DBG2OLC and Sparc software packages, previously developed by the authors for regular hybrid assembly. Using a human genome as an example, we show that with only 7× coverage of ultra-long Nanopore® reads, augmented with 10× reads, our approach achieved nearly the same level of quality, compared with non-hybrid assembly with 35× coverage of Nanopore reads. Compared with the assembly with 10×-Genomics reads alone, our assembly is gapless with slightly high cost. These results suggest that our new hybrid assembly with ultra-long 3GS reads augmented with 10×-Genomics reads offers a low-cost (less than » the cost of the non-hybrid assembly) and computationally light-weighted (only took 109 calendar hours with peak memory-usageâ¯=â¯61GB on a dual-CPU office workstation) solution for extending the wide applications of the 3GS technologies.

Assuntos

Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Mapeamento de Sequências Contíguas/métodos , Genômica , Humanos

2.

BlindCall: ultra-fast base-calling of high-throughput sequencing data by blind deconvolution.

Ye, Chengxi; Hsiao, Chiaowen; Corrada Bravo, Héctor.

Bioinformatics ; 30(9): 1214-9, 2014 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-24413520

RESUMO

MOTIVATION: Base-calling of sequencing data produced by high-throughput sequencing platforms is a fundamental process in current bioinformatics analysis. However, existing third-party probabilistic or machine-learning methods that significantly improve the accuracy of base-calls on these platforms are impractical for production use due to their computational inefficiency. RESULTS: We directly formulate base-calling as a blind deconvolution problem and implemented BlindCall as an efficient solver to this inverse problem. BlindCall produced base-calls at accuracy comparable to state-of-the-art probabilistic methods while processing data at rates 10 times faster in most cases. The computational complexity of BlindCall scales linearly with read length making it better suited for new long-read sequencing technologies.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Humanos , Probabilidade , Reprodutibilidade dos Testes , Software , Fatores de Tempo

3.

Exploiting sparseness in de novo genome assembly.

Ye, Chengxi; Ma, Zhanshan Sam; Cannon, Charles H; Pop, Mihai; Yu, Douglas W.

BMC Bioinformatics ; 13 Suppl 6: S1, 2012 Apr 19.

Artigo em Inglês | MEDLINE | ID: mdl-22537038

RESUMO

BACKGROUND: The very large memory requirements for the construction of assembly graphs for de novo genome assembly limit current algorithms to super-computing environments. METHODS: In this paper, we demonstrate that constructing a sparse assembly graph which stores only a small fraction of the observed k-mers as nodes and the links between these nodes allows the de novo assembly of even moderately-sized genomes (~500 M) on a typical laptop computer. RESULTS: We implement this sparse graph concept in a proof-of-principle software package, SparseAssembler, utilizing a new sparse k-mer graph structure evolved from the de Bruijn graph. We test our SparseAssembler with both simulated and real data, achieving ~90% memory savings and retaining high assembly accuracy, without sacrificing speed in comparison to existing de novo assemblers.

Assuntos

Dispositivos de Armazenamento em Computador , Genoma , Software , Algoritmos , Escherichia coli/genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA

4.

Strategies for Controlling or Releasing the Influence Due to the Volume Expansion of Silicon inside Si-C Composite Anode for High-Performance Lithium-Ion Batteries.

Zhang, Xian; Weng, Jingzheng; Ye, Chengxi; Liu, Mengru; Wang, Chenyu; Wu, Shuru; Tong, Qingsong; Zhu, Mengqi; Gao, Feng.

Materials (Basel) ; 15(12)2022 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-35744323

RESUMO

Currently, silicon is considered among the foremost promising anode materials, due to its high capacity, abundant reserves, environmental friendliness, and low working potential. However, the huge volume changes in silicon anode materials can pulverize the material particles and result in the shedding of active materials and the continual rupturing of the solid electrolyte interface film, leading to a short cycle life and rapid capacity decay. Therefore, the practical application of silicon anode materials is hindered. However, carbon recombination may remedy this defect. In silicon/carbon composite anode materials, silicon provides ultra-high capacity, and carbon is used as a buffer, to relieve the volume expansion of silicon; thus, increasing the use of silicon-based anode materials. To ensure the future utilization of silicon as an anode material in lithium-ion batteries, this review considers the dampening effect on the volume expansion of silicon particles by the formation of carbon layers, cavities, and chemical bonds. Silicon-carbon composites are classified herein as coated core-shell structure, hollow core-shell structure, porous structure, and embedded structure. The above structures can adequately accommodate the Si volume expansion, buffer the mechanical stress, and ameliorate the interface/surface stability, with the potential for performance enhancement. Finally, a perspective on future studies on Si-C anodes is suggested. In the future, the rational design of high-capacity Si-C anodes for better lithium-ion batteries will narrow the gap between theoretical research and practical applications.

5.

From asymmetrical to balanced genomic diversification during rediploidization: Subgenomic evolution in allotetraploid fish.

Luo, Jing; Chai, Jing; Wen, Yanling; Tao, Min; Lin, Guoliang; Liu, Xiaochuan; Ren, Li; Chen, Zeyu; Wu, Shigang; Li, Shengnan; Wang, Yude; Qin, Qinbo; Wang, Shi; Gao, Yun; Huang, Feng; Wang, Lu; Ai, Cheng; Wang, Xiaobo; Li, Lianwei; Ye, Chengxi; Yang, Huimin; Luo, Mi; Chen, Jie; Hu, Hong; Yuan, Liujiao; Zhong, Li; Wang, Jing; Xu, Jian; Du, Zhenglin; Ma, Zhanshan Sam; Murphy, Robert W; Meyer, Axel; Gui, Jianfang; Xu, Peng; Ruan, Jue; Chen, Z Jeffrey; Liu, Shaojun; Lu, Xuemei; Zhang, Ya-Ping.

Sci Adv ; 6(22): eaaz7677, 2020 05.

Artigo em Inglês | MEDLINE | ID: mdl-32766441

RESUMO

A persistent enigma is the rarity of polyploidy in animals, compared to its prevalence in plants. Although animal polyploids are thought to experience deleterious genomic chaos during initial polyploidization and subsequent rediploidization processes, this hypothesis has not been tested. We provide an improved reference-quality de novo genome for allotetraploid goldfish whose origin dates to ~15 million years ago. Comprehensive analyses identify changes in subgenomic evolution from asymmetrical oscillation in goldfish and common carp to diverse stabilization and balanced gene expression during continuous rediploidization. The homoeologs are coexpressed in most pathways, and their expression dominance shifts temporally during embryogenesis. Homoeolog expression correlates negatively with alternation of DNA methylation. The results show that allotetraploid cyprinids have a unique strategy for balancing subgenomic stabilization and diversification. Rediploidization process in these fishes provides intriguing insights into genome evolution and function in allopolyploid vertebrates.

Assuntos

Carpas , Poliploidia , Animais , Evolução Molecular , Genoma , Genômica , Carpa Dourada/genética

6.

Publisher Correction: The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution.

Smith, Jeramiah J; Timoshevskaya, Nataliya; Ye, Chengxi; Holt, Carson; Keinath, Melissa C; Parker, Hugo J; Cook, Malcolm E; Hess, Jon E; Narum, Shawn R; Lamanna, Francesco; Kaessmann, Henrik; Timoshevskiy, Vladimir A; Waterbury, Courtney K M; Saraceno, Cody; Wiedemann, Leanne M; Robb, Sofia M C; Baker, Carl; Eichler, Evan E; Hockman, Dorit; Sauka-Spengler, Tatjana; Yandell, Mark; Krumlauf, Robb; Elgar, Greg; Amemiya, Chris T.

Nat Genet ; 50(11): 1617, 2018 11.

Artigo em Inglês | MEDLINE | ID: mdl-30224652

RESUMO

When published, this article did not initially appear open access. This error has been corrected, and the open access status of the paper is noted in all versions of the paper. Additionally, affiliation 16 denoting equal contribution was missing from author Robb Krumlauf in the PDF originally published. This error has also been corrected.

7.

The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution.

Smith, Jeramiah J; Timoshevskaya, Nataliya; Ye, Chengxi; Holt, Carson; Keinath, Melissa C; Parker, Hugo J; Cook, Malcolm E; Hess, Jon E; Narum, Shawn R; Lamanna, Francesco; Kaessmann, Henrik; Timoshevskiy, Vladimir A; Waterbury, Courtney K M; Saraceno, Cody; Wiedemann, Leanne M; Robb, Sofia M C; Baker, Carl; Eichler, Evan E; Hockman, Dorit; Sauka-Spengler, Tatjana; Yandell, Mark; Krumlauf, Robb; Elgar, Greg; Amemiya, Chris T.

Nat Genet ; 50(2): 270-277, 2018 02.

Artigo em Inglês | MEDLINE | ID: mdl-29358652

RESUMO

The sea lamprey (Petromyzon marinus) serves as a comparative model for reconstructing vertebrate evolution. To enable more informed analyses, we developed a new assembly of the lamprey germline genome that integrates several complementary data sets. Analysis of this highly contiguous (chromosome-scale) assembly shows that both chromosomal and whole-genome duplications have played significant roles in the evolution of ancestral vertebrate and lamprey genomes, including chromosomes that carry the six lamprey HOX clusters. The assembly also contains several hundred genes that are reproducibly eliminated from somatic cells during early development in lamprey. Comparative analyses show that gnathostome (mouse) homologs of these genes are frequently marked by polycomb repressive complexes (PRCs) in embryonic stem cells, suggesting overlaps in the regulatory logic of somatic DNA elimination and bivalent states that are regulated by early embryonic PRCs. This new assembly will enhance diverse studies that are informed by lampreys' unique biology and evolutionary/comparative perspective.

Assuntos

Reprogramação Celular/genética , Evolução Molecular , Genoma , Células Germinativas/metabolismo , Mutagênese/fisiologia , Petromyzon/genética , Vertebrados/genética , Animais , Montagem e Desmontagem da Cromatina/genética , Vertebrados/classificação

8.

Publisher Correction: The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution.

Smith, Jeramiah J; Timoshevskaya, Nataliya; Ye, Chengxi; Holt, Carson; Keinath, Melissa C; Parker, Hugo J; Cook, Malcolm E; Hess, Jon E; Narum, Shawn R; Lamanna, Francesco; Kaessmann, Henrik; Timoshevskiy, Vladimir A; Waterbury, Courtney K M; Saraceno, Cody; Wiedemann, Leanne M; Robb, Sofia M C; Baker, Carl; Eichler, Evan E; Hockman, Dorit; Sauka-Spengler, Tatjana; Yandell, Mark; Krumlauf, Robb; Elgar, Greg; Amemiya, Chris T.

Nat Genet ; 50(5): 768, 2018 05.

Artigo em Inglês | MEDLINE | ID: mdl-29674745

RESUMO

In the version of this article initially published, the present addresses for authors Dorit Hockman and Chris Amemiya were switched. The error has been corrected in the HTML and PDF versions of the article.

9.

Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads.

Ye, Chengxi; Ma, Zhanshan Sam.

PeerJ ; 4: e2016, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27330851

RESUMO

Motivation. The third generation sequencing (3GS) technology generates long sequences of thousands of bases. However, its current error rates are estimated in the range of 15-40%, significantly higher than those of the prevalent next generation sequencing (NGS) technologies (less than 1%). Fundamental bioinformatics tasks such as de novo genome assembly and variant calling require high-quality sequences that need to be extracted from these long but erroneous 3GS sequences. Results. We describe a versatile and efficient linear complexity consensus algorithm Sparc to facilitate de novo genome assembly. Sparc builds a sparse k-mer graph using a collection of sequences from a targeted genomic region. The heaviest path which approximates the most likely genome sequence is searched through a sparsity-induced reweighted graph as the consensus sequence. Sparc supports using NGS and 3GS data together, which leads to significant improvements in both cost efficiency and computational efficiency. Experiments with Sparc show that our algorithm can efficiently provide high-quality consensus sequences using both PacBio and Oxford Nanopore sequencing technologies. With only 30× PacBio data, Sparc can reach a consensus with error rate <0.5%. With the more challenging Oxford Nanopore data, Sparc can also achieve similar error rate when combined with NGS data. Compared with the existing approaches, Sparc calculates the consensus with higher accuracy, and uses approximately 80% less memory and time. Availability. The source code is available for download at https://github.com/yechengxi/Sparc.

10.

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies.

Ye, Chengxi; Hill, Christopher M; Wu, Shigang; Ruan, Jue; Ma, Zhanshan Sam.

Sci Rep ; 6: 31900, 2016 08 30.

Artigo em Inglês | MEDLINE | ID: mdl-27573208

RESUMO

The highly anticipated transition from next generation sequencing (NGS) to third generation sequencing (3GS) has been difficult primarily due to high error rates and excessive sequencing cost. The high error rates make the assembly of long erroneous reads of large genomes challenging because existing software solutions are often overwhelmed by error correction tasks. Here we report a hybrid assembly approach that simultaneously utilizes NGS and 3GS data to address both issues. We gain advantages from three general and basic design principles: (i) Compact representation of the long reads leads to efficient alignments. (ii) Base-level errors can be skipped; structural errors need to be detected and corrected. (iii) Structurally correct 3GS reads are assembled and polished. In our implementation, preassembled NGS contigs are used to derive the compact representation of the long reads, motivating an algorithmic conversion from a de Bruijn graph to an overlap graph, the two major assembly paradigms. Moreover, since NGS and 3GS data can compensate for each other, our hybrid assembly approach reduces both of their sequencing requirements. Experiments show that our software is able to assemble mammalian-sized genomes orders of magnitude more quickly than existing methods without consuming a lot of memory, while saving about half of the sequencing cost.

Assuntos

Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos

11.

Network analysis suggests a potentially 'evil' alliance of opportunistic pathogens inhibited by a cooperative network in human milk bacterial communities.

Sam Ma, Zhanshan; Guan, Qiong; Ye, Chengxi; Zhang, Chengchen; Foster, James A; Forney, Larry J.

Sci Rep ; 5: 8275, 2015 Feb 05.

Artigo em Inglês | MEDLINE | ID: mdl-25651890

RESUMO

The critical importance of human milk to infants and even human civilization has been well established. Yet our understanding of the milk microbiome has been limited to cataloguing OTUs and computation of community diversity. To the best of our knowledge, there has been no report on the bacterial interactions within the milk microbiome. To bridge this gap, we reconstructed a milk bacterial community network based on Hunt et al. Our analysis revealed that the milk microbiome network consists of two disconnected sub-networks. One sub-network is a fully connected complete graph consisting of seven genera as nodes and all of its pair-wise interactions among the bacteria are facilitative or cooperative. In contrast, the interactions in the other sub-network of eight nodes are mixed but dominantly cooperative. Somewhat surprisingly, the only 'non-cooperative' nodes in the second sub-network are mutually cooperative Staphylococcus and Corynebacterium that include some opportunistic pathogens. This potentially 'evil' alliance between Staphylococcus and Corynebacterium could be inhibited by the remaining nodes that cooperate with one another in the second sub-network. We postulate that the 'confrontation' between the 'evil' alliance and 'benign' alliance and the shifting balance between them may be responsible for dysbiosis of the milk microbiome that permits mastitis.

Assuntos

Bactérias , Microbiologia de Alimentos , Microbiota , Leite Humano/microbiologia , Modelos Teóricos , Bactérias/classificação , Bactérias/genética , Humanos , Filogenia , RNA Ribossômico 16S/genética

12.

Reference-free comparative genomics of 174 chloroplasts.

Kua, Chai-Shian; Ruan, Jue; Harting, John; Ye, Cheng-Xi; Helmus, Matthew R; Yu, Jun; Cannon, Charles H.

PLoS One ; 7(11): e48995, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-23185288

RESUMO

Direct analysis of unassembled genomic data could greatly increase the power of short read DNA sequencing technologies and allow comparative genomics of organisms without a completed reference available. Here, we compare 174 chloroplasts by analyzing the taxanomic distribution of short kmers across genomes [1]. We then assemble de novo contigs centered on informative variation. The localized de novo contigs can be separated into two major classes: tip = unique to a single genome and group = shared by a subset of genomes. Prior to assembly, we found that ~18% of the chloroplast was duplicated in the inverted repeat (IR) region across a four-fold difference in genome sizes, from a highly reduced parasitic orchid [2] to a massive algal chloroplast [3], including gnetophytes [4] and cycads [5]. The conservation of this ratio between single copy and duplicated sequence was basal among green plants, independent of photosynthesis and mechanism of genome size change, and different in gymnosperms and lower plants. Major lineages in the angiosperm clade differed in the pattern of shared kmers and de novo contigs. For example, parasitic plants demonstrated an expected accelerated overall rate of evolution, while the hemi-parasitic genomes contained a great deal more novel sequence than holo-parasitic plants, suggesting different mechanisms at different stages of genomic contraction. Additionally, the legumes are diverging more quickly and in different ways than other major families. Small duplicated fragments of the rrn23 genes were deeply conserved among seed plants, including among several species without the IR regions, indicating a crucial functional role of this duplication. Localized de novo assembly of informative kmers greatly reduces the complexity of large comparative analyses by confining the analysis to a small partition of data and genomes relevant to the specific question, allowing direct analysis of next-gen sequence data from previously unstudied genomes and rapid discovery of informative candidate regions.

Assuntos

Cloroplastos/genética , Genoma de Planta/genética , Genômica , Sequência Conservada/genética , Mapeamento de Sequências Contíguas , Tamanho do Genoma , Plantas/classificação , Plantas/genética , Polimorfismo Genético , Padrões de Referência

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa