Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Plant J ; 2024 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-38872506

RESUMO

Tea, one of the most widely consumed beverages globally, exhibits remarkable genomic diversity in its underlying flavour and health-related compounds. In this study, we present the construction and analysis of a tea pangenome comprising a total of 11 genomes, with a focus on three newly sequenced genomes comprising the purple-leaved assamica cultivar "Zijuan", the temperature-sensitive sinensis cultivar "Anjibaicha" and the wild accession "L618" whose assemblies exhibited excellent quality scores as they profited from latest sequencing technologies. Our analysis incorporates a detailed investigation of transposon complement across the tea pangenome, revealing shared patterns of transposon distribution among the studied genomes and improved transposon resolution with long read technologies, as shown by long terminal repeat (LTR) Assembly Index analysis. Furthermore, our study encompasses a gene-centric exploration of the pangenome, exploring the genomic landscape of the catechin pathway with our study, providing insights on copy number alterations and gene-centric variants, especially for Anthocyanidin synthases. We constructed a gene-centric pangenome by structurally and functionally annotating all available genomes using an identical pipeline, which both increased gene completeness and allowed for a high functional annotation rate. This improved and consistently annotated gene set will allow for a better comparison between tea genomes. We used this improved pangenome to capture the core and dispensable gene repertoire, elucidating the functional diversity present within the tea species. This pangenome resource might serve as a valuable resource for understanding the fundamental genetic basis of traits such as flavour, stress tolerance, and disease resistance, with implications for tea breeding programmes.

2.
Plant J ; 112(4): 897-918, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36073999

RESUMO

Breeding has increasingly altered the genetics of crop plants since the domestication of their wild progenitors. It is postulated that the genetic diversity of elite wheat breeding pools is too narrow to cope with future challenges. In contrast, plant genetic resources (PGRs) of wheat stored in genebanks are valuable sources of unexploited genetic diversity. Therefore, to ensure breeding progress in the future, it is of prime importance to identify the useful allelic diversity available in PGRs and to transfer it into elite breeding pools. Here, a diverse collection consisting of modern winter wheat cultivars and genebank accessions was investigated based on reduced-representation genomic sequencing and an iSelect single nucleotide polymorphism (SNP) chip array. Analyses of these datasets provided detailed insights into population structure, levels of genetic diversity, sources of new allelic diversity, and genomic regions affected by breeding activities. We identified 57 regions representing genomic signatures of selection and 827 regions representing private alleles associated exclusively with genebank accessions. The presence of known functional wheat genes, quantitative trait loci, and large chromosomal modifications, i.e., introgressions from wheat wild relatives, provided initial evidence for putative traits associated within these identified regions. These findings were supported by the results of ontology enrichment analyses. The results reported here will stimulate further research and promote breeding in the future by allowing for the targeted introduction of novel allelic diversity into elite wheat breeding pools.


Assuntos
Pão , Triticum , Triticum/genética , Alelos , Melhoramento Vegetal , Genoma de Planta/genética , Polimorfismo de Nucleotídeo Único/genética
3.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33589928

RESUMO

This article describes some use case studies and self-assessments of FAIR status of de.NBI services to illustrate the challenges and requirements for the definition of the needs of adhering to the FAIR (findable, accessible, interoperable and reusable) data principles in a large distributed bioinformatics infrastructure. We address the challenge of heterogeneity of wet lab technologies, data, metadata, software, computational workflows and the levels of implementation and monitoring of FAIR principles within the different bioinformatics sub-disciplines joint in de.NBI. On the one hand, this broad service landscape and the excellent network of experts are a strong basis for the development of useful research data management plans. On the other hand, the large number of tools and techniques maintained by distributed teams renders FAIR compliance challenging.


Assuntos
Gerenciamento de Dados/métodos , Metadados , Redes Neurais de Computação , Proteômica/métodos , Software , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cooperação Internacional , Fenótipo , Plantas/genética , Proteoma , Autoavaliação (Psicologia) , Fluxo de Trabalho
4.
Nature ; 544(7651): 427-433, 2017 04 26.
Artigo em Inglês | MEDLINE | ID: mdl-28447635

RESUMO

Cereal grasses of the Triticeae tribe have been the major food source in temperate regions since the dawn of agriculture. Their large genomes are characterized by a high content of repetitive elements and large pericentromeric regions that are virtually devoid of meiotic recombination. Here we present a high-quality reference genome assembly for barley (Hordeum vulgare L.). We use chromosome conformation capture mapping to derive the linear order of sequences across the pericentromeric space and to investigate the spatial organization of chromatin in the nucleus at megabase resolution. The composition of genes and repetitive elements differs between distal and proximal regions. Gene family analyses reveal lineage-specific duplications of genes involved in the transport of nutrients to developing seeds and the mobilization of carbohydrates in grains. We demonstrate the importance of the barley reference sequence for breeding by inspecting the genomic partitioning of sequence variation in modern elite germplasm, highlighting regions vulnerable to genetic erosion.


Assuntos
Cromossomos de Plantas/genética , Genoma de Planta/genética , Hordeum/genética , Núcleo Celular/genética , Centrômero/genética , Cromatina/genética , Cromatina/metabolismo , Mapeamento Cromossômico , Cromossomos Artificiais Bacterianos/genética , Variação Genética , Genômica , Haplótipos/genética , Meiose/genética , Sequências Repetitivas de Ácido Nucleico/genética , Sementes/genética
5.
Int J Mol Sci ; 24(5)2023 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-36901897

RESUMO

This study aimed to isolate cells from grade 4 glioblastoma multiforme tumors for infection experiments with Zika virus (ZIKV) prME or ME enveloped HIV-1 pseudotypes. The cells obtained from tumor tissue were successfully cultured in human cerebrospinal fluid (hCSF) or a mixture of hCSF/DMEM in cell culture flasks with polar and hydrophilic surfaces. The isolated tumor cells as well as the U87, U138, and U343 cells tested positive for ZIKV receptors Axl and Integrin αvß5. Pseudotype entry was detected by the expression of firefly luciferase or green fluorescent protein (gfp). In prME and ME pseudotype infections, luciferase expression in U-cell lines was 2.5 to 3.5 logarithms above the background, but still two logarithms lower than in the VSV-G pseudotype control. Infection of single cells was successfully detected in U-cell lines and isolated tumor cells by gfp detection. Even though prME and ME pseudotypes had low infection rates, pseudotypes with ZIKV envelopes are promising candidates for the treatment of glioblastoma.


Assuntos
Glioblastoma , HIV-1 , Infecção por Zika virus , Zika virus , Humanos , Glioblastoma/terapia , Linhagem Celular , Proteínas de Fluorescência Verde
6.
Plant Cell ; 31(7): 1430-1445, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-31023840

RESUMO

Chloroplasts fuel plant development and growth by converting solar energy into chemical energy. They mature from proplastids through the concerted action of genes in both the organellar and the nuclear genome. Defects in such genes impair chloroplast development and may lead to pigment-deficient seedlings or seedlings with variegated leaves. Such mutants are instrumental as tools for dissecting genetic factors underlying the mechanisms involved in chloroplast biogenesis. Characterization of the green-white variegated albostrians mutant of barley (Hordeum vulgare) has greatly broadened the field of chloroplast biology, including the discovery of retrograde signaling. Here, we report identification of the ALBOSTRIANS gene HvAST (also known as Hordeum vulgare CCT Motif Family gene 7, HvCMF7) by positional cloning as well as its functional validation based on independently induced mutants by Targeting Induced Local Lesions in Genomes (TILLING) and RNA-guided clustered regularly interspaced short palindromic repeats-associated protein 9 endonuclease-mediated gene editing. The phenotypes of the independent HvAST mutants imply residual activity of HvCMF7 in the original albostrians allele conferring an imperfect penetrance of the variegated phenotype even at homozygous state of the mutation. HvCMF7 is a homolog of the Arabidopsis (Arabidopsis thaliana) CONSTANS, CO-like, and TOC1 (CCT) Motif transcription factor gene CHLOROPLAST IMPORT APPARATUS2, which was reported to be involved in the expression of nuclear genes essential for chloroplast biogenesis. Notably, in barley we localized HvCMF7 to the chloroplast, without any clear evidence for nuclear localization.


Assuntos
Cloroplastos/metabolismo , Genes de Plantas , Hordeum/genética , Folhas de Planta/fisiologia , Proteínas de Plantas/genética , Alelos , Sequência de Aminoácidos , Sequência de Bases , Proteína 9 Associada à CRISPR/metabolismo , Cloroplastos/ultraestrutura , Mapeamento Cromossômico , Proteínas de Fluorescência Verde/metabolismo , Hordeum/ultraestrutura , Mutagênese Sítio-Dirigida , Mutação/genética , Folhas de Planta/ultraestrutura , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , RNA de Plantas/metabolismo
7.
Plant J ; 102(3): 631-642, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-31823436

RESUMO

Many plant genomes display high levels of repetitive sequences. The assembly of these complex genomes using short high-throughput sequence reads is still a challenging task. Underestimation or disregard of repeat complexity in these datasets can easily misguide downstream analysis. Detection of repetitive regions by k-mer counting methods has proved to be reliable. Easy-to-use applications utilizing k-mer counting are in high demand, especially in the domain of plants. We present Kmasker plants, a tool that uses k-mer count information as an assistant throughout the analytical workflow of genome data that is provided as a command-line and web-based solution. Beside its core competence to screen and mask repetitive sequences, we have integrated features that enable comparative studies between different cultivars or closely related species and methods that estimate target specificity of guide RNAs for application of site-directed mutagenesis using Cas9 endonuclease. In addition, we have set up a web service for Kmasker plants that maintains pre-computed indices for 10 of the economically most important cultivated plants. Source code for Kmasker plants has been made publically available at https://github.com/tschmutzer/kmasker. The web service is accessible at https://kmasker.ipk-gatersleben.de.


Assuntos
Genoma de Planta/genética , Algoritmos , Edição de Genes , Genômica , RNA Guia de Cinetoplastídeos/genética , Análise de Sequência de DNA , Software
8.
New Phytol ; 230(6): 2179-2185, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33503271

RESUMO

B chromosomes (Bs) are supernumerary dispensable components of the standard genome (A chromosomes, As) that have been found in many eukaryotes. So far, it is unkown whether the B-derived transcripts translate to proteins or if the host proteome is changed due to the presence of Bs. Comparative mass spectrometry was performed using the protein samples isolated from shoots of rye plants with and without Bs. We aimed to identify B-associated peptides and analyzed the effects of Bs on the total proteome. Our comparative proteome analysis demonstrates that the presence of rye Bs affects the total proteome including different biological function processes. We found 319 of 16 776 quantified features in at least three out of five +B plants but not in 0B plants; 31 of 319 features were identified as B-associated peptide features. According to our data mining, one B-specific protein fragment showed similarity to a glycine-rich RNA binding protein which differed from its A-paralogue by two amino acid insertions. Our result represents a milestone in B chromosome research, because this is the first report to demonstrate the existence of Bs changing the proteome of the host.


Assuntos
Cromossomos de Plantas , Secale , Cromossomos de Plantas/genética , Espectrometria de Massas , Peptídeos , Secale/genética
9.
Plant Biotechnol J ; 18(6): 1396-1408, 2020 06.
Artigo em Inglês | MEDLINE | ID: mdl-31782598

RESUMO

Resistance breeding is crucial for a sustainable control of leaf rust (Puccinia triticina) in wheat (Triticum aestivum L.) while directly targeting functional variants is the Holy Grail for efficient marker-assisted selection and map-based cloning. We assessed the limits and prospects of exome association analysis for severity of leaf rust in a large hybrid wheat population of 1574 single-crosses plus their 133 parents. After imputation and quality control, exome sequencing revealed 202 875 single-nucleotide polymorphisms (SNPs) covering 19.7% of the high-confidence annotated gene space. We performed intensive data mining and found significant associations for 2171 SNPs corresponding to 50 different loci. Some of these associations mapped in the proximity of the already known resistance genes Lr21, Lr34-B, Lr1 and Lr10, while other associated genomic regions, such as those on chromosomes 1A and 3D, harboured several annotated genes putatively involved in resistance. Validation with an independent population helped to narrow down the list of putative resistance genes that should be targeted by fine-mapping. We expect that the proposed strategy of intensive data mining coupled with validation will significantly influence research in plant genetics and breeding.


Assuntos
Basidiomycota , Triticum , Cruzamento , Resistência à Doença/genética , Exoma/genética , Genes de Plantas/genética , Humanos , Doenças das Plantas/genética , Triticum/genética
10.
Bioinformatics ; 33(16): 2583-2585, 2017 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-28398459

RESUMO

MOTIVATION: Microsatellites are a widely-used marker system in plant genetics and forensics. The development of reliable microsatellite markers from resequencing data is challenging. RESULTS: We extended MISA, a computational tool assisting the development of microsatellite markers, and reimplemented it as a web-based application. We improved compound microsatellite detection and added the possibility to display and export MISA results in GFF3 format for downstream analysis. AVAILABILITY AND IMPLEMENTATION: MISA-web can be accessed under http://misaweb.ipk-gatersleben.de/. The website provides tutorials, usage note as well as download links to the source code. CONTACT: scholz@ipk-gatersleben.de.


Assuntos
Genoma de Planta , Repetições de Microssatélites , Plantas/genética , Análise de Sequência de DNA/métodos , Software , Genômica/métodos , Internet
11.
Hereditas ; 155: 10, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-28878591

RESUMO

BACKGROUND: Short-culm mutants have been widely used in breeding programs to increase lodging resistance. In barley (Hordeum vulgare L.), several hundreds of short-culm mutants have been isolated over the years. The objective of the present study was to identify the Brachytic1 (Brh1) semi-dwarfing gene and to test its effect on yield and malting quality. RESULTS: Double-haploid lines generated through a cross between a brh1.a mutant and the European elite malting cultivar Quench, showed good malting quality but a decrease in yield. Especially the activities of the starch degrading enzymes ß-amylase and free limit dextrinase were high. A syntenic approach comparing markers in barley to those in rice (Oryza sativa L.), sorghum (Sorghum bicolor Moench) and brachypodium (Brachypodium distachyon P. Beauv) helped us to identify Brh1 as an orthologue of rice D1 encoding the Gα subunit of a heterotrimeric G protein. We demonstrated that Brh1 is allelic to Ari-m. Sixteen different mutant alleles were described at the DNA level. CONCLUSIONS: Mutants in the Brh1 locus are deficient in the Gα subunit of a heterotrimeric G protein, which shows that heterotrimeric G proteins are important regulators of culm length in barley. Mutant alleles do not have any major negative effects on malting quality.


Assuntos
Proteínas Heterotriméricas de Ligação ao GTP/genética , Hordeum/genética , Proteínas de Plantas/genética , Alelos , Hordeum/crescimento & desenvolvimento , Mutação , Fenótipo , Melhoramento Vegetal
12.
Cytogenet Genome Res ; 152(2): 90-96, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28719910

RESUMO

Genetic maps are based on the recombination frequency of molecular markers which often show different positions in comparison to the corresponding physical maps. To decipher the position and order of DNA sequences genetically mapped to terminal and interstitial regions of barley (Hordeum vulgare) chromosome 3H, fluorescence in situ hybridization (FISH) on mitotic metaphase chromosomes was performed with 16 genomic single-copy probes derived from fingerprinted BAC contigs. Long genetic distances at subterminal regions translated into short physical distances, confirming that recombination events occur more often at distal regions of chromosome 3H. Nonoverlapping FISH signals were frequently obtained for probes with a physical distance of at least 30-60 kb. Only 8% of the analyzed chromosomes showed a symmetric order of FISH signals on both sister chromatids. Due to the dynamic packing of metaphase chromatin, the order of 2 adjacent single-copy signals along the chromosome arms outside the (peri)centromeric region can only reliably be determined if the cytological distance is approximately 3%, corresponding to 21.6 Mb.


Assuntos
Mapeamento Cromossômico/métodos , Cromossomos de Plantas/genética , Dosagem de Genes , Hordeum/genética , Hibridização in Situ Fluorescente/métodos , Metáfase/genética , Mapeamento Físico do Cromossomo/métodos , Pareamento de Bases/genética , Cromátides/genética
13.
Plant Physiol ; 171(2): 1113-27, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27208226

RESUMO

Inflorescence architecture in small-grain cereals has a direct effect on yield and is an important selection target in breeding for yield improvement. We analyzed the recessive mutation laxatum-a (lax-a) in barley (Hordeum vulgare), which causes pleiotropic changes in spike development, resulting in (1) extended rachis internodes conferring a more relaxed inflorescence, (2) broadened base of the lemma awns, (3) thinner grains that are largely exposed due to reduced marginal growth of the palea and lemma, and (4) and homeotic conversion of lodicules into two stamenoid structures. Map-based cloning enforced by mapping-by-sequencing of the mutant lax-a locus enabled the identification of a homolog of BLADE-ON-PETIOLE1 (BOP1) and BOP2 as the causal gene. Interestingly, the recently identified barley uniculme4 gene also is a BOP1/2 homolog and has been shown to regulate tillering and leaf sheath development. While the Arabidopsis (Arabidopsis thaliana) BOP1 and BOP2 genes act redundantly, the barley genes contribute independent effects in specifying the developmental growth of vegetative and reproductive organs, respectively. Analysis of natural genetic diversity revealed strikingly different haplotype diversity for the two paralogous barley genes, likely affected by the respective genomic environments, since no indication for an active selection process was detected.


Assuntos
Proteínas de Arabidopsis/química , Genes Homeobox , Genes de Plantas , Hordeum/anatomia & histologia , Hordeum/genética , Inflorescência/anatomia & histologia , Homologia de Sequência de Aminoácidos , Proteínas de Arabidopsis/metabolismo , Pareamento de Bases/genética , Mapeamento Cromossômico , Clonagem Molecular , Ecótipo , Variação Genética , Anotação de Sequência Molecular , Mutação , Fenótipo , Filogenia , Plantas Geneticamente Modificadas , Recombinação Genética/genética , Análise de Sequência de DNA , Deleção de Sequência
14.
Plant J ; 84(2): 385-94, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26332657

RESUMO

Genetic maps are based on the frequency of recombination and often show different positions of molecular markers in comparison to physical maps, particularly in the centromere that is generally poor in meiotic recombinations. To decipher the position and order of DNA sequences genetically mapped to the centromere of barley (Hordeum vulgare) chromosome 3H, fluorescence in situ hybridization with mitotic metaphase and meiotic pachytene chromosomes was performed with 70 genomic single-copy probes derived from 65 fingerprinted bacterial artificial chromosomes (BAC) contigs genetically assigned to this recombination cold spot. The total physical distribution of the centromeric 5.5 cM bin of 3H comprises 58% of the mitotic metaphase chromosome length. Mitotic and meiotic chromatin of this recombination-poor region is preferentially marked by a heterochromatin-typical histone mark (H3K9me2), while recombination enriched subterminal chromosome regions are enriched in euchromatin-typical histone marks (H3K4me2, H3K4me3, H3K27me3) suggesting that the meiotic recombination rate could be influenced by the chromatin landscape.


Assuntos
Cromossomos de Plantas/genética , Hordeum/genética , Mapeamento Cromossômico
15.
Plant Biotechnol J ; 14(7): 1511-22, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-26801048

RESUMO

Hierarchical shotgun sequencing remains the method of choice for assembling high-quality reference sequences of complex plant genomes. The efficient exploitation of current high-throughput technologies and powerful computational facilities for large-insert clone sequencing necessitates the sequencing and assembly of a large number of clones in parallel. We developed a multiplexed pipeline for shotgun sequencing and assembling individual bacterial artificial chromosomes (BACs) using the Illumina sequencing platform. We illustrate our approach by sequencing 668 barley BACs (Hordeum vulgare L.) in a single Illumina HiSeq 2000 lane. Using a newly designed parallelized computational pipeline, we obtained sequence assemblies of individual BACs that consist, on average, of eight sequence scaffolds and represent >98% of the genomic inserts. Our BAC assemblies are clearly superior to a whole-genome shotgun assembly regarding contiguity, completeness and the representation of the gene space. Our methods may be employed to rapidly obtain high-quality assemblies of a large number of clones to assemble map-based reference sequences of plant and animal species with complex genomes by sequencing along a minimum tiling path.


Assuntos
Cromossomos Artificiais Bacterianos/genética , Genoma de Planta , Hordeum/genética , Análise de Sequência de DNA/métodos , Biotecnologia/métodos
16.
Theor Appl Genet ; 128(7): 1343-57, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-25877520

RESUMO

KEY MESSAGE: The candidate gene for the barley Un8 true loose smut resistance gene encodes a deduced protein containing two tandem protein kinase domains. In North America, durable resistance against all known isolates of barley true loose smut, caused by the basidiomycete pathogen Ustilago nuda (Jens.) Rostr. (U. nuda), is under the control of the Un8 resistance gene. Previous genetic studies mapped Un8 to the long arm of chromosome 5 (1HL). Here, a population of 4625 lines segregating for Un8 was used to delimit the Un8 gene to a 0.108 cM interval on chromosome arm 1HL, and assign it to fingerprinted contig 546 of the barley physical map. The minimal tilling path was identified for the Un8 locus using two flanking markers and consisted of two overlapping bacterial artificial chromosomes. One gene located close to a marker co-segregating with Un8 showed high sequence identity to a disease resistance gene containing two kinase domains. Sequence of the candidate gene from the parents of the segregating population, and in an additional 19 barley lines representing a broader spectrum of diversity, showed there was no intron in alleles present in either resistant or susceptible lines, and fifteen amino acid variations unique to the deduced protein sequence in resistant lines differentiated it from the deduced protein sequences in susceptible lines. Some of these variations were present within putative functional domains which may cause a loss of function in the deduced protein sequences within susceptible lines.


Assuntos
Resistência à Doença/genética , Hordeum/genética , Mapeamento Físico do Cromossomo , Doenças das Plantas/genética , Alelos , Sequência de Aminoácidos , Basidiomycota/patogenicidade , Cromossomos de Plantas , DNA de Plantas/genética , Genes de Plantas , Ligação Genética , Marcadores Genéticos , Genótipo , Hordeum/microbiologia , Íntrons , Dados de Sequência Molecular , Fenótipo , Doenças das Plantas/microbiologia , Proteínas de Plantas/genética , Polimorfismo de Nucleotídeo Único , Domínios e Motivos de Interação entre Proteínas , Sintenia
17.
Gigascience ; 122022 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-37083938

RESUMO

BACKGROUND: The sequencing of whole genomes is becoming increasingly affordable. In this context, large-scale sequencing projects are generating ever larger datasets of species-specific genomic diversity. As a consequence, more and more genomic data need to be made easily accessible and analyzable to the scientific community. FINDINGS: We present DivBrowse, a web application for interactive visualization and exploratory analysis of genomic diversity data stored in Variant Call Format (VCF) files of any size. By seamlessly combining BLAST as an entry point together with interactive data analysis features such as principal component analysis in one graphical user interface, DivBrowse provides a novel and unique set of exploratory data analysis capabilities for genomic biodiversity datasets. The capability to integrate DivBrowse into existing web applications supports interoperability between different web applications. Built-in interactive computation of principal component analysis allows users to perform ad hoc analysis of the population structure based on specific genetic elements such as genes and exons. Data interoperability is supported by the ability to export genomic diversity data in VCF and General Feature Format 3 files. CONCLUSION: DivBrowse offers a novel approach for interactive visualization and analysis of genomic diversity data and optionally also gene annotation data by including features like interactive calculation of variant frequencies and principal component analysis. The use of established standard file formats for data input supports interoperability and seamless deployment of application instances based on the data output of established bioinformatics pipelines.


Assuntos
Genômica , Software , Biologia Computacional , Genoma , Anotação de Sequência Molecular
18.
F1000Res ; 112022.
Artigo em Inglês | MEDLINE | ID: mdl-35811804

RESUMO

In this opinion article, we discuss the formatting of files from (plant) genotyping studies, in particular the formatting of (meta-) data in Variant Call Format (VCF) files. The flexibility of the VCF format specification facilitates its use as a generic interchange format across domains but can lead to inconsistency between files in the presentation of metadata. To enable fully autonomous machine actionable data flow, generic elements need to be further specified. We strongly support the merits of the FAIR principles and see the need to facilitate them also through technical implementation specifications. VCF files are an established standard for the exchange and publication of genotyping data. Other data formats are also used to capture variant call data (for example, the HapMap format and the gVCF format), but none currently have the reach of VCF. In VCF, only the sites of variation are described, whereas in gVCF, all positions are listed, and confidence values are also provided. For the sake of simplicity, we will only discuss VCF and our recommendations for its use. However, the part of the VCF standard relating to metadata (as opposed to the actual variant calls) defines a syntactic format but no vocabulary, unique identifier or recommended content. In practice, often only sparse (if any) descriptive metadata is included. When descriptive metadata is provided, proprietary metadata fields are frequently added that have not been agreed upon within the community which may limit long-term and comprehensive interoperability. To address this, we propose recommendations for supplying and encoding metadata, focusing on use cases from the plant sciences. We expect there to be overlap, but also divergence, with the needs of other domains.


Assuntos
Metadados , Software , Genótipo
19.
Sci Adv ; 7(24)2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-34117061

RESUMO

The potential of big data to support businesses has been demonstrated in financial services, manufacturing, and telecommunications. Here, we report on efforts to enter a new data era in plant breeding by collecting genomic and phenotypic information from 12,858 wheat genotypes representing 6575 single-cross hybrids and 6283 inbred lines that were evaluated in six experimental series for yield in field trials encompassing ~125,000 plots. Integrating data resulted in twofold higher prediction ability compared with cases in which hybrid performance was predicted across individual experimental series. Our results suggest that combining data across breeding programs is a particularly appropriate strategy to exploit the potential of big data for predictive plant breeding. This paradigm shift can contribute to increasing yield and resilience, which is needed to feed the growing world population.

20.
Front Plant Sci ; 11: 1040, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32754184

RESUMO

Collections of plant genetic resources stored in genebanks are an important source of genetic diversity for improvement in plant breeding programs and for conservation of natural variation. The establishment of reduced representative collections from a large set of genotypes is a valuable tool that provides cost-effective access to the diversity present in the whole set. Software like Core Hunter 3 is available to generate high quality core sets. In addition, general clustering approaches, e.g., k-medoids, are available to subdivide a large data set into small groups with maximum genetic diversity between groups. Illumina genotyping platforms are a very efficient tool for the assessment of genetic diversity of plant genetic resources. The accumulation of genotyping data over time using commercial genotyping platforms raises the question of how such huge amount of information can be efficiently used for creating core collections. In the present study, after developing a 15K wheat Infinium array with 12,908 SNPs and genotyping a set of 479 hexaploid winter wheat lines (Triticum aestivum), a larger data set was created by merging 411 lines previously genotyped with the 90K iSelect array. Overlaying the markers from the 15K and 90K arrays enabled the identification of a common set of 12,806 markers, suggesting that the 15K array is a valuable and cost-effective resource for plant breeding programs. Finally, we selected genetically diverse core sets out of these 890 wheat genotypes derived from five collections based on the common markers from the 15K and 90K SNP arrays. Two different approaches, k-medoids and Core Hunter 3 were compared,and k-medoids was identified as an efficient method for selecting small core sets out of a large collection of genotypes while retaining the genetic diversity of the original population.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA