Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Korea4K: whole genome sequences of 4,157 Koreans with 107 phenotypes derived from extensive health check-ups.

Jeon, Sungwon; Choi, Hansol; Jeon, Yeonsu; Choi, Whan-Hyuk; Choi, Hyunjoo; An, Kyungwhan; Ryu, Hyojung; Bhak, Jihun; Lee, Hyeonjae; Kwon, Yoonsung; Ha, Sukyeon; Kim, Yeo Jin; Blazyte, Asta; Kim, Changjae; Kim, Yeonkyung; Kang, Younghui; Woo, Yeong Ju; Lee, Chanyoung; Seo, Jeongwoo; Yoon, Changhan; Bolser, Dan; Biro, Orsolya; Shin, Eun-Seok; Kim, Byung Chul; Kim, Seon-Young; Park, Ji-Hwan; Jeon, Jongbum; Jung, Dooyoung; Lee, Semin; Bhak, Jong.

Gigascience ; 132024 01 02.

Artigo em Inglês | MEDLINE | ID: mdl-38626723

RESUMO

BACKGROUND: Phenome-wide association studies (PheWASs) have been conducted on Asian populations, including Koreans, but many were based on chip or exome genotyping data. Such studies have limitations regarding whole genome-wide association analysis, making it crucial to have genome-to-phenome association information with the largest possible whole genome and matched phenome data to conduct further population-genome studies and develop health care services based on population genomics. RESULTS: Here, we present 4,157 whole genome sequences (Korea4K) coupled with 107 health check-up parameters as the largest genomic resource of the Korean Genome Project. It encompasses most of the variants with allele frequency >0.001 in Koreans, indicating that it sufficiently covered most of the common and rare genetic variants with commonly measured phenotypes for Koreans. Korea4K provides 45,537,252 variants, and half of them were not present in Korea1K (1,094 samples). We also identified 1,356 new genotype-phenotype associations that were not found by the Korea1K dataset. Phenomics analyses further revealed 24 significant genetic correlations, 14 pleiotropic associations, and 127 causal relationships based on Mendelian randomization among 37 traits. In addition, the Korea4K imputation reference panel, the largest Korean variants reference to date, showed a superior imputation performance to Korea1K across all allele frequency categories. CONCLUSIONS: Collectively, Korea4K provides not only the largest Korean genome data but also corresponding health check-up parameters and novel genome-phenome associations. The large-scale pathological whole genome-wide omics data will become a powerful set for genome-phenome level association studies to discover causal markers for the prediction and diagnosis of health conditions in future studies.

Assuntos

Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Fenótipo , Estudos de Associação Genética , Frequência do Gene , República da Coreia , Genótipo

2.

Comparative analysis of repeat content in plant genomes, large and small.

Argentin, Joris; Bolser, Dan; Kersey, Paul J; Flicek, Paul.

Front Plant Sci ; 14: 1103035, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37521909

RESUMO

The DNA Features pipeline is the analysis pipeline at EMBL-EBI that annotates repeat elements, including transposable elements. With Ensembl's goal to stay at the cutting edge of genome annotation, we proved that this pipeline needed an update. We then created a new analysis that allowed the Ensembl database to store the repeat classification from the PGSB repeat classification (Recat). This new dataset was then fetched using Perl scripts and used to prove that the pipeline modification induced a gain in sensitivity. Finally, we performed a comparative analysis of transposable element distribution in all plant species available, raising new questions about transposable elements in certain branches of the taxonomic tree.

3.

LT1, an ONT long-read-based assembly scaffolded with Hi-C data and polished with short reads.

Kim, Hui-Su; Blazyte, Asta; Jeon, Sungwon; Yoon, Changhan; Kim, Yeonkyung; Kim, Changjae; Bolser, Dan; Ahn, Ji-Hye; Edwards, Jeremy S; Bhak, Jong.

GigaByte ; 2022: gigabyte51, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36824523

RESUMO

We present LT1, the first high-quality human reference genome from the Baltic States. LT1 is a female de novo human reference genome assembly, constructed using 57× nanopore long reads and polished using 47× short paired-end reads. We utilized 72 GB of Hi-C chromosomal mapping data for scaffolding, to maximize assembly contiguity and accuracy. The contig assembly of LT1 was 2.73 Gbp in length, comprising 4490 contigs with an NG50 value of 12.0 Mbp. After scaffolding with Hi-C data and manual curation, the final assembly has an NG50 value of 137 Mbp and 4699 scaffolds. Assessment of gene prediction quality using Benchmarking Universal Single-Copy Orthologs (BUSCO) identified 89.3% of the single-copy orthologous genes included in the benchmark. Detailed characterization of LT1 suggests it has 73,744 predicted transcripts, 4.2 million autosomal SNPs, 974,616 short indels, and 12,079 large structural variants. These data may be used as a benchmark for further in-depth genomic analyses of Baltic populations.

4.

A chromosome-scale genome assembly and annotation of the spring orchid (Cymbidium goeringii).

Chung, Oksung; Kim, Jungeun; Bolser, Dan; Kim, Hak-Min; Jun, Je Hoon; Choi, Jae-Pil; Jang, Hyun-Do; Cho, Yun Sung; Bhak, Jong; Kwak, Myounghai.

Mol Ecol Resour ; 22(3): 1168-1177, 2022 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-34687590

RESUMO

Cymbidium goeringii, commonly known as the spring orchid, has long been favoured for horticultural purposes in Asian countries. It is a popular orchid with much demand for improvement and development for its valuable varieties. Until now, its reference genome has not been published despite its popularity and conservation efforts. Here, we report the de novo assembly of the C. goeringii genome, which is the largest among the orchids published to date, using a strategy that combines short- and long-read sequencing and chromosome conformation capture (Hi-C) information. The total length of all scaffolds is 3.99 Gb, with an N50 scaffold size of 178.2 Mb. A total of 29,556 protein-coding genes were annotated and 3.55 Gb (88.87% of genome) repetitive sequences were identified. We constructed pseudomolecular chromosomes using Hi-C, incorporating 89.4% of the scaffolds in 20 chromosomes. We identified 220 expanded and 106 contracted genes families in C. goeringii after divergence from its close relative. We also identified new gene families, resistance gene analogues and changes within the MADS-box genes, which control a diverse set of developmental processes during orchid evolution. Our high quality chromosomal-level assembly of C. goeringii can provide a platform for elucidating the genomic evolution of orchids, mining functional genes for agronomic traits and for developing molecular markers for accelerated breeding as well as accelerating conservation efforts.

Assuntos

Orchidaceae , Melhoramento Vegetal , Cromossomos , Genoma , Humanos , Anotação de Sequência Molecular , Orchidaceae/genética

5.

Regional TMPRSS2 V197M Allele Frequencies Are Correlated with COVID-19 Case Fatality Rates.

Jeon, Sungwon; Blazyte, Asta; Yoon, Changhan; Ryu, Hyojung; Jeon, Yeonsu; Bhak, Youngjune; Bolser, Dan; Manica, Andrea; Shin, Eun-Seok; Cho, Yun Sung; Kim, Byung Chul; Ryoo, Namhee; Choi, Hansol; Bhak, Jong.

Mol Cells ; 44(9): 680-687, 2021 Sep 30.

Artigo em Inglês | MEDLINE | ID: mdl-34588322

RESUMO

Coronavirus disease, COVID-19 (coronavirus disease 2019), caused by SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), has a higher case fatality rate in European countries than in others, especially East Asian ones. One potential explanation for this regional difference is the diversity of the viral infection efficiency. Here, we analyzed the allele frequencies of a nonsynonymous variant rs12329760 (V197M) in the TMPRSS2 gene, a key enzyme essential for viral infection and found a significant association between the COVID-19 case fatality rate and the V197M allele frequencies, using over 200,000 present-day and ancient genomic samples. East Asian countries have higher V197M allele frequencies than other regions, including European countries which correlates to their lower case fatality rates. Structural and energy calculation analysis of the V197M amino acid change showed that it destabilizes the TMPRSS2 protein, possibly negatively affecting its ACE2 and viral spike protein processing.

Assuntos

COVID-19/genética , COVID-19/mortalidade , Serina Endopeptidases/genética , Povo Asiático , COVID-19/etnologia , Frequência do Gene , Humanos , Modelos Moleculares , Mortalidade , Polimorfismo de Nucleotídeo Único , República da Coreia , Serina Endopeptidases/química , População Branca

6.

Comparative analysis of 7 short-read sequencing platforms using the Korean Reference Genome: MGI and Illumina sequencing benchmark for whole-genome sequencing.

Kim, Hak-Min; Jeon, Sungwon; Chung, Oksung; Jun, Je Hoon; Kim, Hui-Su; Blazyte, Asta; Lee, Hwang-Yeol; Yu, Youngseok; Cho, Yun Sung; Bolser, Dan M; Bhak, Jong.

Gigascience ; 10(3)2021 03 12.

Artigo em Inglês | MEDLINE | ID: mdl-33710328

RESUMO

BACKGROUND: DNBSEQ-T7 is a new whole-genome sequencer developed by Complete Genomics and MGI using DNA nanoball and combinatorial probe anchor synthesis technologies to generate short reads at a very large scale-up to 60 human genomes per day. However, it has not been objectively and systematically compared against Illumina short-read sequencers. FINDINGS: By using the same KOREF sample, the Korean Reference Genome, we have compared 7 sequencing platforms including BGISEQ-500, DNBSEQ-T7, HiSeq2000, HiSeq2500, HiSeq4000, HiSeqX10, and NovaSeq6000. We measured sequencing quality by comparing sequencing statistics (base quality, duplication rate, and random error rate), mapping statistics (mapping rate, depth distribution, and percent GC coverage), and variant statistics (transition/transversion ratio, dbSNP annotation rate, and concordance rate with single-nucleotide polymorphism [SNP] genotyping chip) across the 7 sequencing platforms. We found that MGI platforms showed a higher concordance rate for SNP genotyping than HiSeq2000 and HiSeq4000. The similarity matrix of variant calls confirmed that the 2 MGI platforms have the most similar characteristics to the HiSeq2500 platform. CONCLUSIONS: Overall, MGI and Illumina sequencing platforms showed comparable levels of sequencing quality, uniformity of coverage, percent GC coverage, and variant accuracy; thus we conclude that the MGI platforms can be used for a wide range of genomics research fields at a lower cost than the Illumina platforms.

Assuntos

Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Genoma Humano , Humanos , República da Coreia , Análise de Sequência de DNA , Sequenciamento Completo do Genoma

7.

Welfare Genome Project: A Participatory Korean Personal Genome Project With Free Health Check-Up and Genetic Report Followed by Counseling.

Jeon, Yeonsu; Jeon, Sungwon; Blazyte, Asta; Kim, Yeo Jin; Lee, Jasmin Junseo; Bhak, Youngjune; Cho, Yun Sung; Park, Yeshin; Noh, Eui-Kyu; Manica, Andrea; Edwards, Jeremy S; Bolser, Dan; Kim, Sukyeon; Lee, Yuji; Yoon, Changhan; Lee, Semin; Kim, Byung Chul; Park, Neung Hwa; Bhak, Jong.

Front Genet ; 12: 633731, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33633791

RESUMO

The Welfare Genome Project (WGP) provided 1,000 healthy Korean volunteers with detailed genetic and health reports to test the social perception of integrating personal genetic and healthcare data at a large-scale. WGP was launched in 2016 in the Ulsan Metropolitan City as the first large-scale genome project with public participation in Korea. The project produced a set of genetic materials, genotype information, clinical data, and lifestyle survey answers from participants aged 20-96. As compensation, the participants received a free general health check-up on 110 clinical traits, accompanied by a genetic report of their genotypes followed by genetic counseling. In a follow-up survey, 91.0% of the participants indicated that their genetic reports motivated them to improve their health. Overall, WGP expanded not only the general awareness of genomics, DNA sequencing technologies, bioinformatics, and bioethics regulations among all the parties involved, but also the general public's understanding of how genome projects can indirectly benefit their health and lifestyle management. WGP established a data construction framework for not only scientific research but also the welfare of participants. In the future, the WGP framework can help lay the groundwork for a new personalized healthcare system that is seamlessly integrated with existing public medical infrastructure.

8.

Korean Genome Project: 1094 Korean personal genomes with clinical information.

Jeon, Sungwon; Bhak, Youngjune; Choi, Yeonsong; Jeon, Yeonsu; Kim, Seunghoon; Jang, Jaeyoung; Jang, Jinho; Blazyte, Asta; Kim, Changjae; Kim, Yeonkyung; Shim, Jungae; Kim, Nayeong; Kim, Yeo Jin; Park, Seung Gu; Kim, Jungeun; Cho, Yun Sung; Park, Yeshin; Kim, Hak-Min; Kim, Byoung-Chul; Park, Neung-Hwa; Shin, Eun-Seok; Kim, Byung Chul; Bolser, Dan; Manica, Andrea; Edwards, Jeremy S; Church, George; Lee, Semin; Bhak, Jong.

Sci Adv ; 6(22): eaaz7835, 2020 05.

Artigo em Inglês | MEDLINE | ID: mdl-32766443

RESUMO

We present the initial phase of the Korean Genome Project (Korea1K), including 1094 whole genomes (sequenced at an average depth of 31×), along with data of 79 quantitative clinical traits. We identified 39 million single-nucleotide variants and indels of which half were singleton or doubleton and detected Korean-specific patterns based on several types of genomic variations. A genome-wide association study illustrated the power of whole-genome sequences for analyzing clinical traits, identifying nine more significant candidate alleles than previously reported from the same linkage disequilibrium blocks. Also, Korea1K, as a reference, showed better imputation accuracy for Koreans than the 1KGP panel. As proof of utility, germline variants in cancer samples could be filtered out more effectively when the Korea1K variome was used as a panel of normals compared to non-Korean variome sets. Overall, this study shows that Korea1K can be a useful genotypic and phenotypic resource for clinical and ethnogenetic studies.

Assuntos

Genoma Humano , Estudo de Associação Genômica Ampla , Povo Asiático , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , República da Coreia

9.

Efficient mutation screening for cervical cancers from circulating tumor DNA in blood.

Lee, Sun-Young; Chae, Dong-Kyu; Lee, Sung-Hun; Lim, Yohan; An, Jahyun; Chae, Chang Hoon; Kim, Byung Chul; Bhak, Jong; Bolser, Dan; Cho, Dong-Hyu.

BMC Cancer ; 20(1): 694, 2020 Jul 27.

Artigo em Inglês | MEDLINE | ID: mdl-32718341

RESUMO

BACKGROUND: Early diagnosis and continuous monitoring are necessary for an efficient management of cervical cancers (CC). Liquid biopsy, such as detecting circulating tumor DNA (ctDNA) from blood, is a simple, non-invasive method for testing and monitoring cancer markers. However, tumor-specific alterations in ctDNA have not been extensively investigated or compared to other circulating biomarkers in the diagnosis and monitoring of the CC. Therfore, Next-generation sequencing (NGS) analysis with blood samples can be a new approach for highly accurate diagnosis and monitoring of the CC. METHOD: Using a bioinformatics approach, we designed a panel of 24 genes associated with CC to detect and characterize patterns of somatic single-nucleotide variations, indels, and copy number variations. Our NGS CC panel covers most of the genes in The Cancer Genome Atlas (TCGA) as well as additional cancer driver and tumor suppressor genes. We profiled the variants in ctDNA from 24 CC patients who were being treated with systemic chemotherapy and local radiotherapy at the Jeonbuk National University Hospital, Korea. RESULT: Eighteen out of 24 genes in our NGS CC panel had mutations across the 24 CC patients, including somatic alterations of mutated genes (ZFHX3-83%, KMT2C-79%, KMT2D-79%, NSD1-67%, ATM-38% and RNF213-27%). We demonstrated that the RNF213 mutation could be used potentially used as a monitoring marker for response to chemo- and radiotherapy. CONCLUSION: We developed our NGS CC panel and demostrated that our NGS panel can be useful for the diagnosis and monitoring of the CC, since the panel detected the common somatic variations in CC patients and we observed how these genetic variations change according to the treatment pattern of the patient.

Assuntos

DNA Tumoral Circulante/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação , Neoplasias do Colo do Útero/genética , Adenocarcinoma/sangue , Adenocarcinoma/genética , Adenocarcinoma/patologia , Adenocarcinoma/terapia , Adenosina Trifosfatases/genética , Idoso , Carcinoma de Células Escamosas/sangue , Carcinoma de Células Escamosas/genética , Carcinoma de Células Escamosas/patologia , Carcinoma de Células Escamosas/terapia , DNA Tumoral Circulante/sangue , Classe I de Fosfatidilinositol 3-Quinases/genética , Proteínas de Ligação a DNA/genética , Feminino , Marcadores Genéticos , Proteínas de Homeodomínio/genética , Humanos , Pessoa de Meia-Idade , Proteínas de Neoplasias/genética , Estudos Prospectivos , Proteínas Proto-Oncogênicas p21(ras)/genética , Sensibilidade e Especificidade , Ubiquitina-Proteína Ligases/genética , Neoplasias do Colo do Útero/sangue , Neoplasias do Colo do Útero/patologia , Neoplasias do Colo do Útero/terapia

10.

Decoding a highly mixed Kazakh genome.

Seidualy, Madina; Blazyte, Asta; Jeon, Sungwon; Bhak, Youngjune; Jeon, Yeonsu; Kim, Jungeun; Eriksson, Anders; Bolser, Dan; Yoon, Changhan; Manica, Andrea; Lee, Semin; Bhak, Jong.

Hum Genet ; 139(5): 557-568, 2020 May.

Artigo em Inglês | MEDLINE | ID: mdl-32076829

RESUMO

We provide a Kazakh whole genome sequence (MJS) and analyses with the largest comparative Kazakh genomic data available to date. We found 102,240 novel SNVs and a high level of heterozygosity. ADMIXTURE analysis confirmed a significant proportion of variations in this individual coming from all continents except Africa and Oceania. A principal component analysis showed neighboring Kalmyk, Uzbek, and Kyrgyz populations to have the strongest resemblance to the MJS genome which reflects fairly recent Kazakh history. MJS's mitochondrial haplogroup, J1c2, probably represents an early European and Near Eastern influence to Central Asia. This was also supported by the heterozygous SNPs associated with European phenotypic features and strikingly similar Kazakh ancestral composition inferred by ADMIXTURE. Admixture (f3) analysis showed that MJS's genomic signature is best described as a cross between the Neolithic East Asian (Devil's Gate1) and the Bronze Age European (Halberstadt_LBA1) components rather than a contemporary admixture.

Assuntos

Etnicidade/genética , Genética Populacional , Genoma Humano , Polimorfismo de Nucleotídeo Único , População Branca/genética , China , DNA Mitocondrial , Feminino , Humanos , Cazaquistão , Análise de Componente Principal

11.

Ensembl Genomes 2020-enabling non-vertebrate genomic research.

Howe, Kevin L; Contreras-Moreira, Bruno; De Silva, Nishadi; Maslen, Gareth; Akanni, Wasiu; Allen, James; Alvarez-Jarreta, Jorge; Barba, Matthieu; Bolser, Dan M; Cambell, Lahcen; Carbajo, Manuel; Chakiachvili, Marc; Christensen, Mikkel; Cummins, Carla; Cuzick, Alayne; Davis, Paul; Fexova, Silvie; Gall, Astrid; George, Nancy; Gil, Laurent; Gupta, Parul; Hammond-Kosack, Kim E; Haskell, Erin; Hunt, Sarah E; Jaiswal, Pankaj; Janacek, Sophie H; Kersey, Paul J; Langridge, Nick; Maheswari, Uma; Maurel, Thomas; McDowall, Mark D; Moore, Ben; Muffato, Matthieu; Naamati, Guy; Naithani, Sushma; Olson, Andrew; Papatheodorou, Irene; Patricio, Mateus; Paulini, Michael; Pedro, Helder; Perry, Emily; Preece, Justin; Rosello, Marc; Russell, Matthew; Sitnik, Vasily; Staines, Daniel M; Stein, Joshua; Tello-Ruiz, Marcela K; Trevanion, Stephen J; Urban, Martin.

Nucleic Acids Res ; 48(D1): D689-D695, 2020 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-31598706

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Variação Genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Algoritmos , Animais , Caenorhabditis elegans/genética , Genômica , Internet , Anotação de Sequência Molecular , Fenótipo , Plantas/genética , Valores de Referência , Software , Interface Usuário-Computador

12.

Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species.

Kersey, Paul Julian; Allen, James E; Allot, Alexis; Barba, Matthieu; Boddu, Sanjay; Bolt, Bruce J; Carvalho-Silva, Denise; Christensen, Mikkel; Davis, Paul; Grabmueller, Christoph; Kumar, Navin; Liu, Zicheng; Maurel, Thomas; Moore, Ben; McDowall, Mark D; Maheswari, Uma; Naamati, Guy; Newman, Victoria; Ong, Chuang Kee; Paulini, Michael; Pedro, Helder; Perry, Emily; Russell, Matthew; Sparrow, Helen; Tapanari, Electra; Taylor, Kieron; Vullo, Alessandro; Williams, Gareth; Zadissia, Amonida; Olson, Andrew; Stein, Joshua; Wei, Sharon; Tello-Ruiz, Marcela; Ware, Doreen; Luciani, Aurelien; Potter, Simon; Finn, Robert D; Urban, Martin; Hammond-Kosack, Kim E; Bolser, Dan M; De Silva, Nishadi; Howe, Kevin L; Langridge, Nicholas; Maslen, Gareth; Staines, Daniel Michael; Yates, Andrew.

Nucleic Acids Res ; 46(D1): D802-D808, 2018 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-29092050

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.

Assuntos

Archaea/genética , Bactérias/genética , Bases de Dados Genéticas , Bases de Dados de Proteínas , Eucariotos/genética , Genômica , Sequência de Aminoácidos , Animais , Sequência de Bases , Mineração de Dados , Previsões , Genoma , Anotação de Sequência Molecular , RNA/genética , Interface Usuário-Computador

13.

An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations.

Clavijo, Bernardo J; Venturini, Luca; Schudoma, Christian; Accinelli, Gonzalo Garcia; Kaithakottil, Gemy; Wright, Jonathan; Borrill, Philippa; Kettleborough, George; Heavens, Darren; Chapman, Helen; Lipscombe, James; Barker, Tom; Lu, Fu-Hao; McKenzie, Neil; Raats, Dina; Ramirez-Gonzalez, Ricardo H; Coince, Aurore; Peel, Ned; Percival-Alwyn, Lawrence; Duncan, Owen; Trösch, Josua; Yu, Guotai; Bolser, Dan M; Namaati, Guy; Kerhornou, Arnaud; Spannagl, Manuel; Gundlach, Heidrun; Haberer, Georg; Davey, Robert P; Fosker, Christine; Palma, Federica Di; Phillips, Andrew L; Millar, A Harvey; Kersey, Paul J; Uauy, Cristobal; Krasileva, Ksenia V; Swarbreck, David; Bevan, Michael W; Clark, Matthew D.

Genome Res ; 27(5): 885-896, 2017 05.

Artigo em Inglês | MEDLINE | ID: mdl-28420692

RESUMO

Advances in genome sequencing and assembly technologies are generating many high-quality genome sequences, but assemblies of large, repeat-rich polyploid genomes, such as that of bread wheat, remain fragmented and incomplete. We have generated a new wheat whole-genome shotgun sequence assembly using a combination of optimized data types and an assembly algorithm designed to deal with large and complex genomes. The new assembly represents >78% of the genome with a scaffold N50 of 88.8 kb that has a high fidelity to the input data. Our new annotation combines strand-specific Illumina RNA-seq and Pacific Biosciences (PacBio) full-length cDNAs to identify 104,091 high-confidence protein-coding genes and 10,156 noncoding RNA genes. We confirmed three known and identified one novel genome rearrangements. Our approach enables the rapid and scalable assembly of wheat genomes, the identification of structural variants, and the definition of complete gene models, all powerful resources for trait analysis and breeding of this key global crop.

Assuntos

Mapeamento de Sequências Contíguas/métodos , Genoma de Planta , Anotação de Sequência Molecular/métodos , Proteínas de Plantas/genética , Translocação Genética , Triticum/genética , Algoritmos , Mapeamento de Sequências Contíguas/normas , Anotação de Sequência Molecular/normas , Polimorfismo Genético , Poliploidia

14.

Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomic Data.

Bolser, Dan M; Staines, Daniel M; Perry, Emily; Kersey, Paul J.

Methods Mol Biol ; 1533: 1-31, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-27987162

RESUMO

Ensembl Plants ( http://plants.ensembl.org ) is an integrative resource presenting genome-scale information for 39 sequenced plant species. Available data includes genome sequence, gene models, functional annotation, and polymorphic loci; for the latter, additional information including population structure, individual genotypes, linkage, and phenotype data is available for some species. Comparative data is also available, including genomic alignments and "gene trees," which show the inferred evolutionary history of each gene family represented in the resource. Access to the data is provided through a genome browser, which incorporates many specialist interfaces for different data types, through a variety of programmatic interfaces, and via a specialist data mining tool supporting rapid filtering and retrieval of bulk data. Genomic data from many non-plant species, including those of plant pathogens, pests, and pollinators, is also available via the same interfaces through other divisions of Ensembl.Ensembl Plants is updated 4-6 times a year and is developed in collaboration with our international partners in the Gramene ( http://www.gramene.org ) and transPLANT projects ( http://www.transplantdb.eu ).

Assuntos

Biologia Computacional/métodos , Genoma de Planta , Genômica , Plantas/genética , Software , Mapeamento Cromossômico , Produtos Agrícolas/genética , Mineração de Dados/métodos , Bases de Dados Genéticas , Variação Genética , Genômica/métodos , Anotação de Sequência Molecular , Fenótipo , Interface Usuário-Computador , Navegador

15.

Tools and data services registry: a community effort to document bioinformatics resources.

Ison, Jon; Rapacki, Kristoffer; Ménager, Hervé; Kalas, Matús; Rydza, Emil; Chmura, Piotr; Anthon, Christian; Beard, Niall; Berka, Karel; Bolser, Dan; Booth, Tim; Bretaudeau, Anthony; Brezovsky, Jan; Casadio, Rita; Cesareni, Gianni; Coppens, Frederik; Cornell, Michael; Cuccuru, Gianmauro; Davidsen, Kristian; Vedova, Gianluca Della; Dogan, Tunca; Doppelt-Azeroual, Olivia; Emery, Laura; Gasteiger, Elisabeth; Gatter, Thomas; Goldberg, Tatyana; Grosjean, Marie; Grüning, Björn; Helmer-Citterich, Manuela; Ienasescu, Hans; Ioannidis, Vassilios; Jespersen, Martin Closter; Jimenez, Rafael; Juty, Nick; Juvan, Peter; Koch, Maximilian; Laibe, Camille; Li, Jing-Woei; Licata, Luana; Mareuil, Fabien; Micetic, Ivan; Friborg, Rune Møllegaard; Moretti, Sebastien; Morris, Chris; Möller, Steffen; Nenadic, Aleksandra; Peterson, Hedi; Profiti, Giuseppe; Rice, Peter; Romano, Paolo.

Nucleic Acids Res ; 44(D1): D38-47, 2016 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-26538599

RESUMO

Life sciences are yielding huge data sets that underpin scientific discoveries fundamental to improvement in human health, agriculture and the environment. In support of these discoveries, a plethora of databases and tools are deployed, in technically complex and diverse implementations, across a spectrum of scientific disciplines. The corpus of documentation of these resources is fragmented across the Web, with much redundancy, and has lacked a common standard of information. The outcome is that scientists must often struggle to find, understand, compare and use the best resources for the task at hand.Here we present a community-driven curation effort, supported by ELIXIR-the European infrastructure for biological information-that aspires to a comprehensive and consistent registry of information about bioinformatics resources. The sustainable upkeep of this Tools and Data Services Registry is assured by a curation effort driven by and tailored to local needs, and shared amongst a network of engaged partners.As of November 2015, the registry includes 1785 resources, with depositions from 126 individual registrations including 52 institutional providers and 74 individuals. With community support, the registry can become a standard for dissemination of information about bioinformatics resources: we welcome everyone to join us in this common endeavour. The registry is freely available at https://bio.tools.

Assuntos

Biologia Computacional , Sistema de Registros , Curadoria de Dados , Software

16.

Ensembl Plants: Integrating Tools for Visualizing, Mining, and Analyzing Plant Genomics Data.

Bolser, Dan; Staines, Daniel M; Pritchard, Emily; Kersey, Paul.

Methods Mol Biol ; 1374: 115-40, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26519403

RESUMO

Ensembl Plants ( http://plants.ensembl.org ) is an integrative resource presenting genome-scale information for a growing number of sequenced plant species (currently 33). Data provided includes genome sequence, gene models, functional annotation, and polymorphic loci. Various additional information are provided for variation data, including population structure, individual genotypes, linkage, and phenotype data. In each release, comparative analyses are performed on whole genome and protein sequences, and genome alignments and gene trees are made available that show the implied evolutionary history of each gene family. Access to the data is provided through a genome browser incorporating many specialist interfaces for different data types, and through a variety of additional methods for programmatic access and data mining. These access routes are consistent with those offered through the Ensembl interface for the genomes of non-plant species, including those of plant pathogens, pests, and pollinators.Ensembl Plants is updated 4-5 times a year and is developed in collaboration with our international partners in the Gramene ( http://www.gramene.org ) and transPLANT projects ( http://www.transplantdb.org ).

Assuntos

Biologia Computacional/métodos , Genômica/métodos , Plantas/genética , Mineração de Dados/métodos , Bases de Dados Genéticas , Genoma de Planta , Navegador

17.

Triticeae resources in Ensembl Plants.

Bolser, Dan M; Kerhornou, Arnaud; Walts, Brandon; Kersey, Paul.

Plant Cell Physiol ; 56(1): e3, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25432969

RESUMO

Recent developments in DNA sequencing have enabled the large and complex genomes of many crop species to be determined for the first time, even those previously intractable due to their polyploid nature. Indeed, over the course of the last 2 years, the genome sequences of several commercially important cereals, notably barley and bread wheat, have become available, as well as those of related wild species. While still incomplete, comparison with other, more completely assembled species suggests that coverage of genic regions is likely to be high. Ensembl Plants (http://plants.ensembl.org) is an integrative resource organizing, analyzing and visualizing genome-scale information for important crop and model plants. Available data include reference genome sequence, variant loci, gene models and functional annotation. For variant loci, individual and population genotypes, linkage information and, where available, phenotypic information are shown. Comparative analyses are performed on DNA and protein sequence alignments. The resulting genome alignments and gene trees, representing the implied evolutionary history of the gene family, are made available for visualization and analysis. Driven by the case of bread wheat, specific extensions to the analysis pipelines and web interface have recently been developed to support polyploid genomes. Data in Ensembl Plants is accessible through a genome browser incorporating various specialist interfaces for different data types, and through a variety of additional methods for programmatic access and data mining. These interfaces are consistent with those offered through the Ensembl interface for the genomes of non-plant species, including those of plant pathogens, pests and pollinators, facilitating the study of the plant in its environment.

Assuntos

Genoma de Planta/genética , Genômica , Hordeum/genética , Transcriptoma , Triticum/genética , Grão Comestível/genética , Variação Genética , Genótipo , Armazenamento e Recuperação da Informação , Internet , Interface Usuário-Computador

18.

De novo transcriptome assembly and analyses of gene expression during photomorphogenesis in diploid wheat Triticum monococcum.

Fox, Samuel E; Geniza, Matthew; Hanumappa, Mamatha; Naithani, Sushma; Sullivan, Chris; Preece, Justin; Tiwari, Vijay K; Elser, Justin; Leonard, Jeffrey M; Sage, Abigail; Gresham, Cathy; Kerhornou, Arnaud; Bolser, Dan; McCarthy, Fiona; Kersey, Paul; Lazo, Gerard R; Jaiswal, Pankaj.

PLoS One ; 9(5): e96855, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24821410

RESUMO

BACKGROUND: Triticum monococcum (2n) is a close ancestor of T. urartu, the A-genome progenitor of cultivated hexaploid wheat, and is therefore a useful model for the study of components regulating photomorphogenesis in diploid wheat. In order to develop genetic and genomic resources for such a study, we constructed genome-wide transcriptomes of two Triticum monococcum subspecies, the wild winter wheat T. monococcum ssp. aegilopoides (accession G3116) and the domesticated spring wheat T. monococcum ssp. monococcum (accession DV92) by generating de novo assemblies of RNA-Seq data derived from both etiolated and green seedlings. PRINCIPAL FINDINGS: The de novo transcriptome assemblies of DV92 and G3116 represent 120,911 and 117,969 transcripts, respectively. We successfully mapped â¼90% of these transcripts from each accession to barley and â¼95% of the transcripts to T. urartu genomes. However, only â¼77% transcripts mapped to the annotated barley genes and â¼85% transcripts mapped to the annotated T. urartu genes. Differential gene expression analyses revealed 22% more light up-regulated and 35% more light down-regulated transcripts in the G3116 transcriptome compared to DV92. The DV92 and G3116 mRNA sequence reads aligned against the reference barley genome led to the identification of â¼500,000 single nucleotide polymorphism (SNP) and â¼22,000 simple sequence repeat (SSR) sites. CONCLUSIONS: De novo transcriptome assemblies of two accessions of the diploid wheat T. monococcum provide new empirical transcriptome references for improving Triticeae genome annotations, and insights into transcriptional programming during photomorphogenesis. The SNP and SSR sites identified in our analysis provide additional resources for the development of molecular markers.

Assuntos

Diploide , Transcriptoma/genética , Triticum/genética , Genoma de Planta/genética , Plântula/genética

19.

Gramene 2013: comparative plant genomics resources.

Monaco, Marcela K; Stein, Joshua; Naithani, Sushma; Wei, Sharon; Dharmawardhana, Palitha; Kumari, Sunita; Amarasinghe, Vindhya; Youens-Clark, Ken; Thomason, James; Preece, Justin; Pasternak, Shiran; Olson, Andrew; Jiao, Yinping; Lu, Zhenyuan; Bolser, Dan; Kerhornou, Arnaud; Staines, Dan; Walts, Brandon; Wu, Guanming; D'Eustachio, Peter; Haw, Robin; Croft, David; Kersey, Paul J; Stein, Lincoln; Jaiswal, Pankaj; Ware, Doreen.

Nucleic Acids Res ; 42(Database issue): D1193-9, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24217918

RESUMO

Gramene (http://www.gramene.org) is a curated online resource for comparative functional genomics in crops and model plant species, currently hosting 27 fully and 10 partially sequenced reference genomes in its build number 38. Its strength derives from the application of a phylogenetic framework for genome comparison and the use of ontologies to integrate structural and functional annotation data. Whole-genome alignments complemented by phylogenetic gene family trees help infer syntenic and orthologous relationships. Genetic variation data, sequences and genome mappings available for 10 species, including Arabidopsis, rice and maize, help infer putative variant effects on genes and transcripts. The pathways section also hosts 10 species-specific metabolic pathways databases developed in-house or by our collaborators using Pathway Tools software, which facilitates searches for pathway, reaction and metabolite annotations, and allows analyses of user-defined expression datasets. Recently, we released a Plant Reactome portal featuring 133 curated rice pathways. This portal will be expanded for Arabidopsis, maize and other plant species. We continue to provide genetic and QTL maps and marker datasets developed by crop researchers. The project provides a unique community platform to support scientific research in plant genomics including studies in evolution, genetics, plant breeding, molecular biology, biochemistry and systems biology.

Assuntos

Bases de Dados Genéticas , Genoma de Planta , Genômica , Produtos Agrícolas/genética , Variação Genética , Internet , Redes e Vias Metabólicas/genética , Anotação de Sequência Molecular , Plantas/genética , Plantas/metabolismo

20.

The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability.

Li, Jing-Woei; Bolser, Dan; Manske, Magnus; Giorgi, Federico Manuel; Vyahhi, Nikolay; Usadel, Björn; Clavijo, Bernardo J; Chan, Ting-Fung; Wong, Nathalie; Zerbino, Daniel; Schneider, Maria Victoria.

Brief Bioinform ; 14(5): 548-55, 2013 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-23793381

RESUMO

Next-generation sequencing (NGS) is increasingly being adopted as the backbone of biomedical research. With the commercialization of various affordable desktop sequencers, NGS will be reached by increasing numbers of cellular and molecular biologists, necessitating community consensus on bioinformatics protocols to tackle the exponential increase in quantity of sequence data. The current resources for NGS informatics are extremely fragmented. Finding a centralized synthesis is difficult. A multitude of tools exist for NGS data analysis; however, none of these satisfies all possible uses and needs. This gap in functionality could be filled by integrating different methods in customized pipelines, an approach helped by the open-source nature of many NGS programmes. Drawing from community spirit and with the use of the Wikipedia framework, we have initiated a collaborative NGS resource: The NGS WikiBook. We have collected a sufficient amount of text to incentivize a broader community to contribute to it. Users can search, browse, edit and create new content, so as to facilitate self-learning and feedback to the community. The overall structure and style for this dynamic material is designed for the bench biologists and non-bioinformaticians. The flexibility of online material allows the readers to ignore details in a first read, yet have immediate access to the information they need. Each chapter comes with practical exercises so readers may familiarize themselves with each step. The NGS WikiBook aims to create a collective laboratory book and protocol that explains the key concepts and describes best practices in this fast-evolving field.

Assuntos

Biologia Computacional/educação , Instrução por Computador/métodos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Comportamento Cooperativo , Internet , Ensino

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA