Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

Recurrent evolution and selection shape structural diversity at the amylase locus.

Bolognini, Davide; Halgren, Alma; Lou, Runyang Nicolas; Raveane, Alessandro; Rocha, Joana L; Guarracino, Andrea; Soranzo, Nicole; Chin, Chen-Shan; Garrison, Erik; Sudmant, Peter H.

Nature ; 634(8034): 617-625, 2024 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-39232174

RESUMO

The adoption of agriculture triggered a rapid shift towards starch-rich diets in human populations1. Amylase genes facilitate starch digestion, and increased amylase copy number has been observed in some modern human populations with high-starch intake2, although evidence of recent selection is lacking3,4. Here, using 94 long-read haplotype-resolved assemblies and short-read data from approximately 5,600 contemporary and ancient humans, we resolve the diversity and evolutionary history of structural variation at the amylase locus. We find that amylase genes have higher copy numbers in agricultural populations than in fishing, hunting and pastoral populations. We identify 28 distinct amylase structural architectures and demonstrate that nearly identical structures have arisen recurrently on different haplotype backgrounds throughout recent human history. AMY1 and AMY2A genes each underwent multiple duplication/deletion events with mutation rates up to more than 10,000-fold the single-nucleotide polymorphism mutation rate, whereas AMY2B gene duplications share a single origin. Using a pangenome-based approach, we infer structural haplotypes across thousands of humans identifying extensively duplicated haplotypes at higher frequency in modern agricultural populations. Leveraging 533 ancient human genomes, we find that duplication-containing haplotypes (with more gene copies than the ancestral haplotype) have rapidly increased in frequency over the past 12,000 years in West Eurasians, suggestive of positive selection. Together, our study highlights the potential effects of the agricultural revolution on human genomes and the importance of structural variation in human adaptation.

Assuntos

Agricultura , Amilases , Evolução Molecular , Dosagem de Genes , Genoma Humano , Haplótipos , Seleção Genética , Humanos , Agricultura/história , Agricultura/estatística & dados numéricos , Amilases/genética , Amilases/química , Dosagem de Genes/genética , Duplicação Gênica/genética , Loci Gênicos/genética , Genoma Humano/genética , Haplótipos/genética , História Antiga , Taxa de Mutação , Polimorfismo de Nucleotídeo Único/genética , Caça/estatística & dados numéricos , Deleção de Genes , DNA Antigo/análise

2.

The complete sequence and comparative analysis of ape sex chromosomes.

Makova, Kateryna D; Pickett, Brandon D; Harris, Robert S; Hartley, Gabrielle A; Cechova, Monika; Pal, Karol; Nurk, Sergey; Yoo, DongAhn; Li, Qiuhui; Hebbar, Prajna; McGrath, Barbara C; Antonacci, Francesca; Aubel, Margaux; Biddanda, Arjun; Borchers, Matthew; Bornberg-Bauer, Erich; Bouffard, Gerard G; Brooks, Shelise Y; Carbone, Lucia; Carrel, Laura; Carroll, Andrew; Chang, Pi-Chuan; Chin, Chen-Shan; Cook, Daniel E; Craig, Sarah J C; de Gennaro, Luciana; Diekhans, Mark; Dutra, Amalia; Garcia, Gage H; Grady, Patrick G S; Green, Richard E; Haddad, Diana; Hallast, Pille; Harvey, William T; Hickey, Glenn; Hillis, David A; Hoyt, Savannah J; Jeong, Hyeonsoo; Kamali, Kaivan; Pond, Sergei L Kosakovsky; LaPolice, Troy M; Lee, Charles; Lewis, Alexandra P; Loh, Yong-Hwee E; Masterson, Patrick; McGarvey, Kelly M; McCoy, Rajiv C; Medvedev, Paul; Miga, Karen H; Munson, Katherine M.

Nature ; 630(8016): 401-411, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38811727

RESUMO

Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.

Assuntos

Hominidae , Cromossomo X , Cromossomo Y , Animais , Feminino , Masculino , Gorilla gorilla/genética , Hominidae/genética , Hominidae/classificação , Hylobatidae/genética , Pan paniscus/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Pongo pygmaeus/genética , Telômero/genética , Cromossomo X/genética , Cromossomo Y/genética , Evolução Molecular , Variações do Número de Cópias de DNA/genética , Humanos , Espécies em Perigo de Extinção , Padrões de Referência

3.

The complete sequence of a human Y chromosome.

Rhie, Arang; Nurk, Sergey; Cechova, Monika; Hoyt, Savannah J; Taylor, Dylan J; Altemose, Nicolas; Hook, Paul W; Koren, Sergey; Rautiainen, Mikko; Alexandrov, Ivan A; Allen, Jamie; Asri, Mobin; Bzikadze, Andrey V; Chen, Nae-Chyun; Chin, Chen-Shan; Diekhans, Mark; Flicek, Paul; Formenti, Giulio; Fungtammasan, Arkarachai; Garcia Giron, Carlos; Garrison, Erik; Gershman, Ariel; Gerton, Jennifer L; Grady, Patrick G S; Guarracino, Andrea; Haggerty, Leanne; Halabian, Reza; Hansen, Nancy F; Harris, Robert; Hartley, Gabrielle A; Harvey, William T; Haukness, Marina; Heinz, Jakob; Hourlier, Thibaut; Hubley, Robert M; Hunt, Sarah E; Hwang, Stephen; Jain, Miten; Kesharwani, Rupesh K; Lewis, Alexandra P; Li, Heng; Logsdon, Glennis A; Lucas, Julian K; Makalowski, Wojciech; Markovic, Christopher; Martin, Fergal J; Mc Cartney, Ann M; McCoy, Rajiv C; McDaniel, Jennifer; McNulty, Brandy M.

Nature ; 621(7978): 344-354, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37612512

RESUMO

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

Assuntos

Cromossomos Humanos Y , Genômica , Análise de Sequência de DNA , Humanos , Sequência de Bases , Cromossomos Humanos Y/genética , DNA Satélite/genética , Variação Genética/genética , Genética Populacional , Genômica/métodos , Genômica/normas , Heterocromatina/genética , Família Multigênica/genética , Padrões de Referência , Duplicações Segmentares Genômicas/genética , Análise de Sequência de DNA/normas , Sequências de Repetição em Tandem/genética , Telômero/genética

4.

Semi-automated assembly of high-quality diploid human reference genomes.

Jarvis, Erich D; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R; Porubsky, David; Cheng, Haoyu; Asri, Mobin; Logsdon, Glennis A; Carnevali, Paolo; Chaisson, Mark J P; Chin, Chen-Shan; Cody, Sarah; Collins, Joanna; Ebert, Peter; Escalona, Merly; Fedrigo, Olivier; Fulton, Robert S; Fulton, Lucinda L; Garg, Shilpa; Gerton, Jennifer L; Ghurye, Jay; Granat, Anastasiya; Green, Richard E; Harvey, William; Hasenfeld, Patrick; Hastie, Alex; Haukness, Marina; Jaeger, Erich B; Jain, Miten; Kirsche, Melanie; Kolmogorov, Mikhail; Korbel, Jan O; Koren, Sergey; Korlach, Jonas; Lee, Joyce; Li, Daofeng; Lindsay, Tina; Lucas, Julian; Luo, Feng; Marschall, Tobias; Mitchell, Matthew W; McDaniel, Jennifer; Nie, Fan; Olsen, Hugh E; Olson, Nathan D.

Nature ; 611(7936): 519-531, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-36261518

RESUMO

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

Assuntos

Mapeamento Cromossômico , Diploide , Genoma Humano , Genômica , Humanos , Mapeamento Cromossômico/normas , Genoma Humano/genética , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/normas , Padrões de Referência , Genômica/métodos , Genômica/normas , Cromossomos Humanos/genética , Variação Genética/genética

5.

Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes.

Chin, Chen-Shan; Behera, Sairam; Khalak, Asif; Sedlazeck, Fritz J; Sudmant, Peter H; Wagner, Justin; Zook, Justin M.

Nat Methods ; 20(8): 1213-1221, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37365340

RESUMO

Advancements in sequencing technologies and assembly methods enable the regular production of high-quality genome assemblies characterizing complex regions. However, challenges remain in efficiently interpreting variation at various scales, from smaller tandem repeats to megabase rearrangements, across many human genomes. We present a PanGenome Research Tool Kit (PGR-TK) enabling analyses of complex pangenome structural and haplotype variation at multiple scales. We apply the graph decomposition methods in PGR-TK to the class II major histocompatibility complex demonstrating the importance of the human pangenome for analyzing complicated regions. Moreover, we investigate the Y-chromosome genes, DAZ1/DAZ2/DAZ3/DAZ4, of which structural variants have been linked to male infertility, and X-chromosome genes OPN1LW and OPN1MW linked to eye disorders. We further showcase PGR-TK across 395 complex repetitive medically important genes. This highlights the power of PGR-TK to resolve complex variation in regions of the genome that were previously too complex to analyze.

Assuntos

Genoma Humano , Genômica , Masculino , Humanos , Complexo Principal de Histocompatibilidade

6.

Improved maize reference genome with single-molecule technologies.

Jiao, Yinping; Peluso, Paul; Shi, Jinghua; Liang, Tiffany; Stitzer, Michelle C; Wang, Bo; Campbell, Michael S; Stein, Joshua C; Wei, Xuehong; Chin, Chen-Shan; Guill, Katherine; Regulski, Michael; Kumari, Sunita; Olson, Andrew; Gent, Jonathan; Schneider, Kevin L; Wolfgruber, Thomas K; May, Michael R; Springer, Nathan M; Antoniou, Eric; McCombie, W Richard; Presting, Gernot G; McMullen, Michael; Ross-Ibarra, Jeffrey; Dawe, R Kelly; Hastie, Alex; Rank, David R; Ware, Doreen.

Nature ; 546(7659): 524-527, 2017 06 22.

Artigo em Inglês | MEDLINE | ID: mdl-28605751

RESUMO

Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.

Assuntos

Genoma de Planta/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Imagem Individual de Molécula/métodos , Zea mays/genética , Centrômero/genética , Cromossomos de Plantas/genética , Mapeamento de Sequências Contíguas , Produtos Agrícolas/genética , Elementos de DNA Transponíveis/genética , DNA Intergênico/genética , Genes de Plantas/genética , Anotação de Sequência Molecular , Óptica e Fotônica , Filogenia , RNA Mensageiro/análise , RNA Mensageiro/genética , Padrões de Referência , Sorghum/genética

7.

Ribbon: intuitive visualization for complex genomic variation.

Nattestad, Maria; Aboukhalil, Robert; Chin, Chen-Shan; Schatz, Michael C.

Bioinformatics ; 37(3): 413-415, 2021 04 20.

Artigo em Inglês | MEDLINE | ID: mdl-32766814

RESUMO

SUMMARY: Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons. AVAILABILITY AND IMPLEMENTATION: Ribbon is freely available online at http://genomeribbon.com/ and is open-source at https://github.com/marianattestad/ribbon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica , Software , Genoma

8.

Trajectories of glomerular filtration rate and progression to end stage kidney disease after kidney transplantation.

Raynaud, Marc; Aubert, Olivier; Reese, Peter P; Bouatou, Yassine; Naesens, Maarten; Kamar, Nassim; Bailly, Élodie; Giral, Magali; Ladrière, Marc; Le Quintrec, Moglie; Delahousse, Michel; Juric, Ivana; Basic-Jukic, Nikolina; Gupta, Gaurav; Akalin, Enver; Yoo, Daniel; Chin, Chen-Shan; Proust-Lima, Cécile; Böhmig, Georg; Oberbauer, Rainer; Stegall, Mark D; Bentall, Andrew J; Jordan, Stanley C; Huang, Edmund; Glotz, Denis; Legendre, Christophe; Montgomery, Robert A; Segev, Dorry L; Empana, Jean-Philippe; Grams, Morgan E; Coresh, Josef; Jouven, Xavier; Lefaucheur, Carmen; Loupy, Alexandre.

Kidney Int ; 99(1): 186-197, 2021 01.

Artigo em Inglês | MEDLINE | ID: mdl-32781106

RESUMO

Although the gold standard of monitoring kidney transplant function relies on glomerular filtration rate (GFR), little is known about GFR trajectories after transplantation, their determinants, and their association with outcomes. To evaluate these parameters we examined kidney transplant recipients receiving care at 15 academic centers. Patients underwent prospective monitoring of estimated GFR (eGFR) measurements, with assessment of clinical, functional, histological and immunological parameters. Additional validation took place in seven randomized controlled trials that included a total of 14,132 patients with 403,497 eGFR measurements. After a median follow-up of 6.5 years, 1,688 patients developed end-stage kidney disease. Using unsupervised latent class mixed models, we identified eight distinct eGFR trajectories. Multinomial regression models identified seven significant determinants of eGFR trajectories including donor age, eGFR, proteinuria, and several significant histological features: graft scarring, graft interstitial inflammation and tubulitis, microcirculation inflammation, and circulating anti-HLA donor specific antibodies. The eGFR trajectories were associated with progression to end stage kidney disease. These trajectories, their determinants and respective associations with end stage kidney disease were similar across cohorts, as well as in diverse clinical scenarios, therapeutic eras and in the seven randomized control trials. Thus, our results provide the basis for a trajectory-based assessment of kidney transplant patients for risk stratification and monitoring.

Assuntos

Falência Renal Crônica , Transplante de Rim , Taxa de Filtração Glomerular , Humanos , Falência Renal Crônica/diagnóstico , Falência Renal Crônica/cirurgia , Transplante de Rim/efeitos adversos , Estudos Prospectivos

9.

Complex rearrangements and oncogene amplifications revealed by long-read DNA and RNA sequencing of a breast cancer cell line.

Nattestad, Maria; Goodwin, Sara; Ng, Karen; Baslan, Timour; Sedlazeck, Fritz J; Rescheneder, Philipp; Garvin, Tyler; Fang, Han; Gurtowski, James; Hutton, Elizabeth; Tseng, Elizabeth; Chin, Chen-Shan; Beck, Timothy; Sundaravadanam, Yogi; Kramer, Melissa; Antoniou, Eric; McPherson, John D; Hicks, James; McCombie, W Richard; Schatz, Michael C.

Genome Res ; 28(8): 1126-1135, 2018 08.

Artigo em Inglês | MEDLINE | ID: mdl-29954844

RESUMO

The SK-BR-3 cell line is one of the most important models for HER2+ breast cancers, which affect one in five breast cancer patients. SK-BR-3 is known to be highly rearranged, although much of the variation is in complex and repetitive regions that may be underreported. Addressing this, we sequenced SK-BR-3 using long-read single molecule sequencing from Pacific Biosciences and develop one of the most detailed maps of structural variations (SVs) in a cancer genome available, with nearly 20,000 variants present, most of which were missed by short-read sequencing. Surrounding the important ERBB2 oncogene (also known as HER2), we discover a complex sequence of nested duplications and translocations, suggesting a punctuated progression. Full-length transcriptome sequencing further revealed several novel gene fusions within the nested genomic variants. Combining long-read genome and transcriptome sequencing enables an in-depth analysis of how SVs disrupt the genome and sheds new light on the complex mechanisms involved in cancer genome evolution.

Assuntos

Neoplasias da Mama/genética , Amplificação de Genes/genética , Rearranjo Gênico/genética , Oncogenes/genética , Neoplasias da Mama/patologia , Feminino , Genoma Humano , Variação Estrutural do Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Células MCF-7 , Receptor ErbB-2/genética , Sequências Repetitivas de Ácido Nucleico/genética , Transcriptoma/genética

10.

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

Schneider, Valerie A; Graves-Lindsay, Tina; Howe, Kerstin; Bouk, Nathan; Chen, Hsiu-Chuan; Kitts, Paul A; Murphy, Terence D; Pruitt, Kim D; Thibaud-Nissen, Françoise; Albracht, Derek; Fulton, Robert S; Kremitzki, Milinn; Magrini, Vincent; Markovic, Chris; McGrath, Sean; Steinberg, Karyn Meltz; Auger, Kate; Chow, William; Collins, Joanna; Harden, Glenn; Hubbard, Timothy; Pelan, Sarah; Simpson, Jared T; Threadgold, Glen; Torrance, James; Wood, Jonathan M; Clarke, Laura; Koren, Sergey; Boitano, Matthew; Peluso, Paul; Li, Heng; Chin, Chen-Shan; Phillippy, Adam M; Durbin, Richard; Wilson, Richard K; Flicek, Paul; Eichler, Evan E; Church, Deanna M.

Genome Res ; 27(5): 849-864, 2017 05.

Artigo em Inglês | MEDLINE | ID: mdl-28396521

RESUMO

The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.

Assuntos

Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Mapeamento de Sequências Contíguas/normas , Genômica/normas , Haploidia , Haplótipos , Humanos , Polimorfismo Genético , Padrões de Referência , Análise de Sequência de DNA/normas

11.

Phased diploid genome assembly with single-molecule real-time sequencing.

Chin, Chen-Shan; Peluso, Paul; Sedlazeck, Fritz J; Nattestad, Maria; Concepcion, Gregory T; Clum, Alicia; Dunn, Christopher; O'Malley, Ronan; Figueroa-Balderas, Rosa; Morales-Cruz, Abraham; Cramer, Grant R; Delledonne, Massimo; Luo, Chongyuan; Ecker, Joseph R; Cantu, Dario; Rank, David R; Schatz, Michael C.

Nat Methods ; 13(12): 1050-1054, 2016 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-27749838

RESUMO

While genome assembly projects have been successful in many haploid and inbred species, the assembly of noninbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.

Assuntos

Diploide , Genoma Fúngico/genética , Genoma de Planta/genética , Genômica/métodos , Polimorfismo de Nucleotídeo Único/genética , Algoritmos , Arabidopsis/genética , Basidiomycota/genética , DNA Fúngico/genética , DNA de Plantas/genética , Haplótipos , Heterozigoto , Humanos , Análise de Sequência de DNA , Vitis/genética

12.

Heterogeneous resistance to quizartinib in acute myeloid leukemia revealed by single-cell analysis.

Smith, Catherine C; Paguirigan, Amy; Jeschke, Grace R; Lin, Kimberly C; Massi, Evan; Tarver, Theodore; Chin, Chen-Shan; Asthana, Saurabh; Olshen, Adam; Travers, Kevin J; Wang, Susana; Levis, Mark J; Perl, Alexander E; Radich, Jerald P; Shah, Neil P.

Blood ; 130(1): 48-58, 2017 07 06.

Artigo em Inglês | MEDLINE | ID: mdl-28490572

RESUMO

Genomic studies have revealed significant branching heterogeneity in cancer. Studies of resistance to tyrosine kinase inhibitor therapy have not fully reflected this heterogeneity because resistance in individual patients has been ascribed to largely mutually exclusive on-target or off-target mechanisms in which tumors either retain dependency on the target oncogene or subvert it through a parallel pathway. Using targeted sequencing from single cells and colonies from patient samples, we demonstrate tremendous clonal diversity in the majority of acute myeloid leukemia (AML) patients with activating FLT3 internal tandem duplication mutations at the time of acquired resistance to the FLT3 inhibitor quizartinib. These findings establish that clinical resistance to quizartinib is highly complex and reflects the underlying clonal heterogeneity of AML.

Assuntos

Benzotiazóis/administração & dosagem , Resistencia a Medicamentos Antineoplásicos , Sequenciamento de Nucleotídeos em Larga Escala , Mutação INDEL , Leucemia Mieloide Aguda , Compostos de Fenilureia/administração & dosagem , Tirosina Quinase 3 Semelhante a fms/genética , Resistencia a Medicamentos Antineoplásicos/efeitos dos fármacos , Resistencia a Medicamentos Antineoplásicos/genética , Feminino , Humanos , Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia Mieloide Aguda/genética , Masculino

13.

Correction: Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding.

Vij, Shubha; Kuhl, Heiner; Kuznetsova, Inna S; Komissarov, Aleksey; Yurchenko, Andrey A; Van Heusden, Peter; Singh, Siddharth; Thevasagayam, Natascha M; Prakki, Sai Rama Sridatta; Purushothaman, Kathiresan; Saju, Jolly M; Jiang, Junhui; Mbandi, Stanley Kimbung; Jonas, Mario; Hin Yan Tong, Amy; Mwangi, Sarah; Lau, Doreen; Ngoh, Si Yan; Liew, Woei Chang; Shen, Xueyan; Hon, Lawrence S; Drake, James P; Boitano, Matthew; Hall, Richard; Chin, Chen-Shan; Lachumanan, Ramkumar; Korlach, Jonas; Trifonov, Vladimir; Kabilov, Marsel; Tupikin, Alexey; Green, Darrell; Moxon, Simon; Garvin, Tyler; Sedlazeck, Fritz J; Vurture, Gregory W; Gopalapillai, Gopikrishna; Katneni, Vinaya Kumar; Noble, Tansyn H; Scaria, Vinod; Sivasubbu, Sridhar; Jerry, Dean R; O'Brien, Stephen J; Schatz, Michael C; Dalmay, Tamás; Turner, Stephen W; Lok, Si; Christoffels, Alan; Orbán, László.

PLoS Genet ; 12(12): e1006500, 2016 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-27935956

RESUMO

[This corrects the article DOI: 10.1371/journal.pgen.1005954.].

14.

Chromosomal-Level Assembly of the Asian Seabass Genome Using Long Sequence Reads and Multi-layered Scaffolding.

Vij, Shubha; Kuhl, Heiner; Kuznetsova, Inna S; Komissarov, Aleksey; Yurchenko, Andrey A; Van Heusden, Peter; Singh, Siddharth; Thevasagayam, Natascha M; Prakki, Sai Rama Sridatta; Purushothaman, Kathiresan; Saju, Jolly M; Jiang, Junhui; Mbandi, Stanley Kimbung; Jonas, Mario; Hin Yan Tong, Amy; Mwangi, Sarah; Lau, Doreen; Ngoh, Si Yan; Liew, Woei Chang; Shen, Xueyan; Hon, Lawrence S; Drake, James P; Boitano, Matthew; Hall, Richard; Chin, Chen-Shan; Lachumanan, Ramkumar; Korlach, Jonas; Trifonov, Vladimir; Kabilov, Marsel; Tupikin, Alexey; Green, Darrell; Moxon, Simon; Garvin, Tyler; Sedlazeck, Fritz J; Vurture, Gregory W; Gopalapillai, Gopikrishna; Kumar Katneni, Vinaya; Noble, Tansyn H; Scaria, Vinod; Sivasubbu, Sridhar; Jerry, Dean R; O'Brien, Stephen J; Schatz, Michael C; Dalmay, Tamás; Turner, Stephen W; Lok, Si; Christoffels, Alan; Orbán, László.

PLoS Genet ; 12(4): e1005954, 2016 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-27082250

RESUMO

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.

Assuntos

Bass/genética , Mapeamento Cromossômico , Animais , Bass/classificação , Genoma , Hibridização in Situ Fluorescente , Filogenia

15.

Assembly and diploid architecture of an individual human genome via single-molecule technologies.

Pendleton, Matthew; Sebra, Robert; Pang, Andy Wing Chun; Ummat, Ajay; Franzen, Oscar; Rausch, Tobias; Stütz, Adrian M; Stedman, William; Anantharaman, Thomas; Hastie, Alex; Dai, Heng; Fritz, Markus Hsi-Yang; Cao, Han; Cohain, Ariella; Deikus, Gintaras; Durrett, Russell E; Blanchard, Scott C; Altman, Roger; Chin, Chen-Shan; Guo, Yan; Paxinos, Ellen E; Korbel, Jan O; Darnell, Robert B; McCombie, W Richard; Kwok, Pui-Yan; Mason, Christopher E; Schadt, Eric E; Bashir, Ali.

Nat Methods ; 12(8): 780-6, 2015 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-26121404

RESUMO

We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.

Assuntos

Biologia Computacional/métodos , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Mapeamento Cromossômico , Diploide , Biblioteca Gênica , Variação Genética , Genoma , Haplótipos , Humanos , Nucleotídeos/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Sequências de Repetição em Tandem

16.

Ten simple rules for large-scale data processing.

Fungtammasan, Arkarachai; Lee, Alexandra; Taroni, Jaclyn; Wheeler, Kurt; Chin, Chen-Shan; Davis, Sean; Greene, Casey.

PLoS Comput Biol ; 18(2): e1009757, 2022 02.

Artigo em Inglês | MEDLINE | ID: mdl-35143491

Assuntos

Guias como Assunto

17.

Validation of ITD mutations in FLT3 as a therapeutic target in human acute myeloid leukaemia.

Smith, Catherine C; Wang, Qi; Chin, Chen-Shan; Salerno, Sara; Damon, Lauren E; Levis, Mark J; Perl, Alexander E; Travers, Kevin J; Wang, Susana; Hunt, Jeremy P; Zarrinkar, Patrick P; Schadt, Eric E; Kasarskis, Andrew; Kuriyan, John; Shah, Neil P.

Nature ; 485(7397): 260-3, 2012 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-22504184

RESUMO

Effective targeted cancer therapeutic development depends upon distinguishing disease-associated 'driver' mutations, which have causative roles in malignancy pathogenesis, from 'passenger' mutations, which are dispensable for cancer initiation and maintenance. Translational studies of clinically active targeted therapeutics can definitively discriminate driver from passenger lesions and provide valuable insights into human cancer biology. Activating internal tandem duplication (ITD) mutations in FLT3 (FLT3-ITD) are detected in approximately 20% of acute myeloid leukaemia (AML) patients and are associated with a poor prognosis. Abundant scientific and clinical evidence, including the lack of convincing clinical activity of early FLT3 inhibitors, suggests that FLT3-ITD probably represents a passenger lesion. Here we report point mutations at three residues within the kinase domain of FLT3-ITD that confer substantial in vitro resistance to AC220 (quizartinib), an active investigational inhibitor of FLT3, KIT, PDGFRA, PDGFRB and RET; evolution of AC220-resistant substitutions at two of these amino acid positions was observed in eight of eight FLT3-ITD-positive AML patients with acquired resistance to AC220. Our findings demonstrate that FLT3-ITD can represent a driver lesion and valid therapeutic target in human AML. AC220-resistant FLT3 kinase domain mutants represent high-value targets for future FLT3 inhibitor development efforts.

Assuntos

Benzotiazóis/uso terapêutico , Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia Mieloide Aguda/genética , Terapia de Alvo Molecular , Mutação/genética , Compostos de Fenilureia/uso terapêutico , Tirosina Quinase 3 Semelhante a fms/antagonistas & inibidores , Tirosina Quinase 3 Semelhante a fms/genética , Benzotiazóis/farmacologia , Linhagem Celular Tumoral , Análise Mutacional de DNA , Resistencia a Medicamentos Antineoplásicos/genética , Humanos , Leucemia Mieloide Aguda/metabolismo , Modelos Moleculares , Estrutura Molecular , Compostos de Fenilureia/farmacologia , Ligação Proteica , Estrutura Terciária de Proteína/genética , Recidiva , Reprodutibilidade dos Testes , Tirosina Quinase 3 Semelhante a fms/metabolismo

18.

Scaffolding of long read assemblies using long range contact information.

Ghurye, Jay; Pop, Mihai; Koren, Sergey; Bickhart, Derek; Chin, Chen-Shan.

BMC Genomics ; 18(1): 527, 2017 07 12.

Artigo em Inglês | MEDLINE | ID: mdl-28701198

RESUMO

BACKGROUND: Long read technologies have revolutionized de novo genome assembly by generating contigs orders of magnitude longer than that of short read assemblies. Although assembly contiguity has increased, it usually does not reconstruct a full chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To increase the contiguity of the assembly to the chromosome level, different strategies are used which exploit long range contact information between chromosomes in the genome. METHODS: We develop a scalable and computationally efficient scaffolding method that can boost the assembly contiguity to a large extent using genome-wide chromatin interaction data such as Hi-C. RESULTS: we demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies. We tested our methods on the human and goat genome assemblies. We compare our scaffolds with the scaffolds generated by LACHESIS based on various metrics. CONCLUSION: Our new algorithm SALSA produces more accurate scaffolds compared to the existing state of the art method LACHESIS.

Assuntos

Mapeamento de Sequências Contíguas/métodos , Algoritmos , Animais , Genômica , Cabras/genética , Humanos

19.

Alpha-CENTAURI: assessing novel centromeric repeat sequence variation with long read sequencing.

Sevim, Volkan; Bashir, Ali; Chin, Chen-Shan; Miga, Karen H.

Bioinformatics ; 32(13): 1921-1924, 2016 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-27153570

RESUMO

MOTIVATION: Long arrays of near-identical tandem repeats are a common feature of centromeric and subtelomeric regions in complex genomes. These sequences present a source of repeat structure diversity that is commonly ignored by standard genomic tools. Unlike reads shorter than the underlying repeat structure that rely on indirect inference methods, e.g. assembly, long reads allow direct inference of satellite higher order repeat structure. To automate characterization of local centromeric tandem repeat sequence variation we have designed Alpha-CENTAURI (ALPHA satellite CENTromeric AUtomated Repeat Identification), that takes advantage of Pacific Bioscience long-reads from whole-genome sequencing datasets. By operating on reads prior to assembly, our approach provides a more comprehensive set of repeat-structure variants and is not impacted by rearrangements or sequence underrepresentation due to misassembly. RESULTS: We demonstrate the utility of Alpha-CENTAURI in characterizing repeat structure for alpha satellite containing reads in the hydatidiform mole (CHM1, haploid-like) genome. The pipeline is designed to report local repeat organization summaries for each read, thereby monitoring rearrangements in repeat units, shifts in repeat orientation and sites of array transition into non-satellite DNA, typically defined by transposable element insertion. We validate the method by showing consistency with existing centromere high order repeat references. Alpha-CENTAURI can, in principle, run on any sequence data, offering a method to generate a sequence repeat resolution that could be readily performed using consensus sequences available for other satellite families in genomes without high-quality reference assemblies. AVAILABILITY AND IMPLEMENTATION: Documentation and source code for Alpha-CENTAURI are freely available at http://github.com/volkansevim/alpha-CENTAURI CONTACT: ali.bashir@mssm.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Centrômero/genética , Biologia Computacional/métodos , Genômica , Análise de Sequência de DNA/métodos , Sequências de Repetição em Tandem , Algoritmos , Sequência Consenso , Feminino , Humanos , Mola Hidatiforme/genética , Gravidez

20.

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data.

Chin, Chen-Shan; Alexander, David H; Marks, Patrick; Klammer, Aaron A; Drake, James; Heiner, Cheryl; Clum, Alicia; Copeland, Alex; Huddleston, John; Eichler, Evan E; Turner, Stephen W; Korlach, Jonas.

Nat Methods ; 10(6): 563-9, 2013 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-23644548

RESUMO

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.

Assuntos

Genoma Bacteriano , Análise de Sequência de DNA/métodos , Cromossomos Artificiais Bacterianos , Escherichia coli/genética , Biblioteca Gênica , Humanos , Sequências Repetitivas de Ácido Nucleico

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA