RESUMO
BACKGROUND: A major obstacle faced by families with rare diseases is obtaining a genetic diagnosis. The average "diagnostic odyssey" lasts over five years and causal variants are identified in under 50%, even when capturing variants genome-wide. To aid in the interpretation and prioritization of the vast number of variants detected, computational methods are proliferating. Knowing which tools are most effective remains unclear. To evaluate the performance of computational methods, and to encourage innovation in method development, we designed a Critical Assessment of Genome Interpretation (CAGI) community challenge to place variant prioritization models head-to-head in a real-life clinical diagnostic setting. METHODS: We utilized genome sequencing (GS) data from families sequenced in the Rare Genomes Project (RGP), a direct-to-participant research study on the utility of GS for rare disease diagnosis and gene discovery. Challenge predictors were provided with a dataset of variant calls and phenotype terms from 175 RGP individuals (65 families), including 35 solved training set families with causal variants specified, and 30 unlabeled test set families (14 solved, 16 unsolved). We tasked teams to identify causal variants in as many families as possible. Predictors submitted variant predictions with estimated probability of causal relationship (EPCR) values. Model performance was determined by two metrics, a weighted score based on the rank position of causal variants, and the maximum F-measure, based on precision and recall of causal variants across all EPCR values. RESULTS: Sixteen teams submitted predictions from 52 models, some with manual review incorporated. Top performers recalled causal variants in up to 13 of 14 solved families within the top 5 ranked variants. Newly discovered diagnostic variants were returned to two previously unsolved families following confirmatory RNA sequencing, and two novel disease gene candidates were entered into Matchmaker Exchange. In one example, RNA sequencing demonstrated aberrant splicing due to a deep intronic indel in ASNS, identified in trans with a frameshift variant in an unsolved proband with phenotypes consistent with asparagine synthetase deficiency. CONCLUSIONS: Model methodology and performance was highly variable. Models weighing call quality, allele frequency, predicted deleteriousness, segregation, and phenotype were effective in identifying causal variants, and models open to phenotype expansion and non-coding variants were able to capture more difficult diagnoses and discover new diagnoses. Overall, computational models can significantly aid variant prioritization. For use in diagnostics, detailed review and conservative assessment of prioritized variants against established criteria is needed.
Assuntos
Doenças Raras , Humanos , Doenças Raras/genética , Doenças Raras/diagnóstico , Genoma Humano/genética , Variação Genética/genética , Biologia Computacional/métodos , FenótipoRESUMO
BACKGROUND: The integration of nuclear mitochondrial DNA (mtDNA) into the mammalian genomes is an ongoing, yet rare evolutionary process that produces nuclear sequences of mitochondrial origin (NUMT). In this study, we identified and analysed NUMT inserted into the pig (Sus scrofa) genome and in the genomes of a few other Suinae species. First, we constructed a comparative distribution map of NUMT in the Sscrofa11.1 reference genome and in 22 other assembled S. scrofa genomes (from Asian and European pig breeds and populations), as well as the assembled genomes of the Visayan warty pig (Sus cebifrons) and warthog (Phacochoerus africanus). We then analysed a total of 485 whole genome sequencing datasets, from different breeds, populations, or Sus species, to discover polymorphic NUMT (inserted/deleted in the pig genome). The insertion age was inferred based on the presence or absence of orthologous NUMT in the genomes of different species, taking into account their evolutionary divergence. Additionally, the age of the NUMT was calculated based on sequence degradation compared to the authentic mtDNA sequence. We also validated a selected set of representative NUMT via PCR amplification. RESULTS: We have constructed an atlas of 418 NUMT regions, 70 of which were not present in any assembled genomes. We identified ancient NUMT regions (older than 55 million years ago, Mya) and NUMT that appeared at different time points along the Suinae evolutionary lineage. We identified very recent polymorphic NUMT (private to S. scrofa, with < 1 Mya), and more ancient polymorphic NUMT (3.5-10 Mya) present in various Sus species. These latest polymorphic NUMT regions, which segregate in European and Asian pig breeds and populations, are likely the results of interspecies admixture within the Sus genus. CONCLUSIONS: This study provided a first comprehensive analysis of NUMT present in the Sus scrofa genome, comparing them to NUMT found in other species within the order Cetartiodactyla. The NUMT-based evolutionary window that we reconstructed from NUMT integration ages could be useful to better understand the micro-evolutionary events that shaped the modern pig genome and enriched the genetic diversity of this species.
Assuntos
DNA Mitocondrial , Animais , DNA Mitocondrial/genética , Sus scrofa/genética , Genoma , Núcleo Celular/genética , Evolução Molecular , Filogenia , Suínos/genéticaRESUMO
Large genotyping datasets, obtained from high-density single nucleotide polymorphism (SNP) arrays, developed for different livestock species, can be used to describe and differentiate breeds or populations. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this study, we applied the Boruta algorithm, a wrapper of the machine learning random forest algorithm, on a database of 23 European pig breeds (20 autochthonous and three cosmopolitan breeds) genotyped with a 70k SNP chip, to pre-select informative SNPs. To identify different sets of SNPs, these pre-selected markers were then ranked with random forest based on their mean decrease accuracy and mean decrease gene indexes. We evaluated the efficiency of these subsets for breed classification and the usefulness of this approach to detect candidate genes affecting breed-specific phenotypes and relevant production traits that might differ among breeds. The lowest overall classification error (2.3%) was reached with a subpanel including only 398 SNPs (ranked based on their mean decrease accuracy), with no classification error in seven breeds using up to 49 SNPs. Several SNPs of these selected subpanels were in genomic regions in which previous studies had identified signatures of selection or genes associated with morphological or production traits that distinguish the analysed breeds. Therefore, even if these approaches have not been originally designed to identify signatures of selection, the obtained results showed that they could potentially be useful for this purpose.
Assuntos
Algoritmos , Genoma , Suínos/genética , Animais , Genótipo , Fenótipo , Polimorfismo de Nucleotídeo Único , Aprendizado de MáquinaRESUMO
Selection and breeding strategies to improve resistance to enteropathies are essential to reaching the sustainability of the rabbit production systems. However, disease heterogeneity (having only as major visible symptom diarrhoea) and low disease heritability are two barriers for the implementation of these strategies. Diarrhoea condition can affect rabbits at different life stages, starting from the suckling period, with large negative economic impacts. In this study, from a commercial population of suckling rabbits (derived from 133 litters) that experienced an outbreak of enteropathy, we first selected a few animals that died with severe symptoms of diarrhoea and characterized their microbiota, using 16S rRNA gene sequencing data. Clostridium genus was consistently present in all affected specimens. In addition, with the aim to identify genetic markers in the rabbit genome that could be used as selection tools, we performed genome-wide association studies for symptoms of diarrhoea in the same commercial rabbit population. These studies were also complemented with FST analyses between the same groups of rabbits. A total of 332 suckling rabbits (151 with severe symptoms of diarrhoea, 42 with mild symptoms and 129 without any symptoms till the weaning period), derived from 45 different litters (a subset of the 133 litters) were genotyped with the Affymetrix Axiom OrcunSNP Array. In both genomic approaches, rabbits within litters were paired to constitute two groups (susceptible and resistant, including the mildly affected in one or the other group) and run case and control genome-wide association analyses. Genomic heritability estimated in the designed experimental structure integrated in a commercial breeding scheme was 0.19-0.21 (s.e. 0.09-0.10). A total of eight genomic regions on rabbit chromosome 2 (OCU2), OCU3, OCU7, OCU12, OCU13, OCU16 and in an unassembled scaffold had significant single nucleotide polymorphisms (SNPs) and/or markers that trespassed the FST percentile distribution. Among these regions, three main peaks of SNPs were identified on OCU12, OCU13 and OCU16. The QTL region on OCU13 encompasses several genes that encode members of a family of immunoglobulin Fc receptors (FCER1G, FCRLA, FCRLB and FCGR2A) involved in the immune innate system, which might be important candidate genes for this pathogenic condition. The results obtained in this study demonstrated that resistance to an enteropathy occurring in suckling rabbits is in part genetically determined and can be dissected at the genomic level, providing DNA markers that could be used in breeding programmes to increase resistance to enteropathies in meat rabbits.
Assuntos
Estudo de Associação Genômica Ampla , Genoma , Coelhos , Animais , Estudo de Associação Genômica Ampla/veterinária , RNA Ribossômico 16S , Genômica , Marcadores Genéticos , Polimorfismo de Nucleotídeo Único , Diarreia/genética , Diarreia/veterináriaRESUMO
BACKGROUND: Intense selection of modern pig breeds has resulted in genetic improvement of production traits while the performance of local pig breeds has remained lower. As local pig breeds have been bred in extensive systems, they have adapted to specific environmental conditions, resulting in a rich genotypic and phenotypic diversity. This study is based on European local pig breeds that have been genetically characterized using DNA-pool sequencing data and phenotypically characterized using breed level phenotypes related to stature, fatness, growth, and reproductive performance traits. These data were analyzed using a dedicated approach to detect signatures of selection linked to phenotypic traits in order to uncover potential candidate genes that may underlie adaptation to specific environments. RESULTS: Analysis of the genetic data of European pig breeds revealed four main axes of genetic variation represented by the Iberian and three modern breeds (i.e. Large White, Landrace, and Duroc). In addition, breeds clustered according to their geographical origin, for example French Gascon and Basque breeds, Italian Apulo Calabrese and Casertana breeds, Spanish Iberian, and Portuguese Alentejano breeds. Principal component analysis of the phenotypic data distinguished the larger and leaner breeds with better growth potential and reproductive performance from the smaller and fatter breeds with low growth and reproductive efficiency. Linking the signatures of selection with phenotype identified 16 significant genomic regions associated with stature, 24 with fatness, 2 with growth, and 192 with reproduction. Among them, several regions contained candidate genes with possible biological effects on stature, fatness, growth, and reproductive performance traits. For example, strong associations were found for stature in two regions containing, respectively, the ANXA4 and ANTXR1 genes, for fatness in a region containing the DNMT3A and POMC genes and for reproductive performance in a region containing the HSD17B7 gene. CONCLUSIONS: In this study on European local pig breeds, we used a dedicated approach for detecting signatures of selection that were supported by phenotypic data at the breed level to identify potential candidate genes that may have adapted to different living environments and production systems.
Assuntos
Genoma , Genômica , Suínos/genética , Animais , Fenótipo , Genótipo , Genômica/métodos , Análise de Sequência de DNARESUMO
The domestic canary (Serinus canaria) is one of the most common pet birds and has been extensively selected and bred over the last few centuries to constitute many different varieties. Plumage pigmentation is one of the main phenotypic traits that distinguish canary breeds and lines. Feather colours in these birds, similarly to other avian species, are mainly depended on the presence of two major types of pigments: carotenoids and melanins. In this study, we exploited whole genome sequencing (WGS) datasets produced from five canary lines or populations (Black Frosted Yellow, Opal, Onyx, Opal × Onyx and Mogno, some of which carrying different putative dilute alleles), complemented with other WGS datasets retrieved from previous studies, to identify candidate genes that might explain pigmentation variability across canary breeds and varieties. Sequencing data were obtained using a DNA pool-seq approach and genomic data were compared using window-based FST analyses. We identified signatures of selection in genomic regions harbouring genes involved in carotenoid-derived pigmentation variants (CYP2J19, EDC, BCO2 and SCARB1), confirming the results reported by previous works, and identified several other signatures of selection in the correspondence of melanogenesis-related genes (AGRP, ASIP, DCT, EDNRB, KITLG, MITF, MLPH, SLC45A2, TYRP1 and ZEB2). Two putative causative mutations were identified in the MLPH gene that may explain the Opal and Onyx dilute mutant alleles. Other signatures of selection were also identified that might explain additional phenotypic differences between the investigated canary populations.
Assuntos
Canários , Pigmentação , Animais , Canários/genética , Cor , Mutação , Pigmentação/genética , Carotenoides , Alelos , Sequenciamento Completo do Genoma/veterináriaRESUMO
Whole genome sequencing (WGS) datasets, usually generated for the investigation of the individual animal genome, can be used for additional mining of the fraction of sequencing reads that remains unmapped to the respective reference genome. A significant proportion of these reads contains viral DNA derived from viruses that infected the sequenced animals. In this study, we mined more than 480 billion sequencing reads derived from 1471 WGS datasets produced from cattle, pigs, chickens and rabbits. We identified 367 different viruses among which 14, 11, 12 and 1 might specifically infect the cattle, pig, chicken and rabbit, respectively. Some of them are ubiquitous, avirulent, highly or potentially damaging for both livestock and humans. Retrieved viral DNA information provided a first unconventional and opportunistic landscape of the livestock viromes that could be useful to understand the distribution of some viruses with potential deleterious impacts on the animal food production systems.
Assuntos
Viroma , Vírus , Animais , Bovinos , Galinhas/genética , DNA Viral , Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Gado/genética , Coelhos , Suínos , Vírus/genéticaRESUMO
Following the recent domestication process of the European rabbit (Oryctolagus cuniculus), many different breeds and lines, distinguished primarily by exterior traits such as coat colour, fur structure and body size and shape, have been constituted. In this study, we genotyped, with a high-density single-nucleotide polymorphism panel, a total of 645 rabbits from 10 fancy breeds (Belgian Hare, Champagne d'Argent, Checkered Giant, Coloured Dwarf, Dwarf Lop, Ermine, Giant Grey, Giant White, Rex and Rhinelander) and three meat breeds (Italian White, Italian Spotted and Italian Silver). ADMIXTURE analysis indicated that breeds with similar phenotypic traits (e.g. coat colour and body size) shared common ancestries. Signatures of selection using two haplotype-based approaches (iHS and XP-EHH), combined with the results obtained with other methods previously reported that we applied to the same breeds, we identified a total of 5079 independent genomic regions with some signatures of selection, covering about 1777 Mb of the rabbit genome. These regions consistently encompassed many genes involved in pigmentation processes (ASIP, EDNRA, EDNRB, KIT, KITLG, MITF, OCA2, TYR and TYRP1), coat structure (LIPH) and body size, including two major genes (LCORL and HMGA2) among many others. This study revealed novel genomic regions under signatures of selection and further demonstrated that population structures and signatures of selection, left into the genome of these rabbit breeds, may contribute to understanding the genetic events that led to their constitution and the complex genetic mechanisms determining the broad phenotypic variability present in these untapped rabbit genetic resources.
RESUMO
BACKGROUND: Domestication of the rabbit (Oryctolagus cuniculus) has led to a multi-purpose species that includes many breeds and lines with a broad phenotypic diversity, mainly for external traits (e.g. coat colours and patterns, fur structure, and morphometric traits) that are valued by fancy rabbit breeders. As a consequence of this human-driven selection, distinct signatures are expected to be present in the rabbit genome, defined as signatures of selection or selective sweeps. Here, we investigated the genome of three Italian commercial meat rabbit breeds (Italian Silver, Italian Spotted and Italian White) and 12 fancy rabbit breeds (Belgian Hare, Burgundy Fawn, Champagne d'Argent, Checkered Giant, Coloured Dwarf, Dwarf Lop, Ermine, Giant Grey, Giant White, Rex, Rhinelander and Thuringian) by using high-density single nucleotide polymorphism data. Signatures of selection were identified based on the fixation index (FST) statistic with different approaches, including single-breed and group-based methods, the latter comparing breeds that are grouped based on external traits (different coat colours and body sizes) and types (i.e. meat vs. fancy breeds). RESULTS: We identified 309 genomic regions that contained signatures of selection and that included genes that are known to affect coat colour (ASIP, MC1R and TYR), coat structure (LIPH), and body size (LCORL/NCAPG, COL11A1 and HOXD) in rabbits and that characterize the investigated breeds. Their identification proves the suitability of the applied methodologies for capturing recent selection events. Other regions included novel candidate genes that might contribute to the phenotypic variation among the analyzed breeds, including genes for pigmentation-related traits (EDNRA, EDNRB, MITF and OCA2) and body size, with a strong candidate for dwarfism in rabbit (COL2A1). CONCLUSIONS: We report a genome-wide view of genetic loci that underlie the main phenotypic differences in the analyzed rabbit breeds, which can be useful to understand the shift from the domestication process to the development of breeds in O. cuniculus. These results enhance our knowledge about the major genetic loci involved in rabbit external traits and add novel information to understand the complexity of the genetic architecture underlying body size in mammals.
Assuntos
Genoma , Genômica , Animais , Carne , Fenótipo , Polimorfismo de Nucleotídeo Único , Coelhos , Seleção GenéticaRESUMO
BACKGROUND: The importance of local breeds as genetic reservoirs of valuable genetic variation is well established. Pig breeding in Central and South-Eastern Europe has a long tradition that led to the formation of several local pig breeds. In the present study, genetic diversity parameters were analysed in six autochthonous pig breeds from Slovenia, Croatia and Serbia (Banija spotted, Black Slavonian, Turopolje pig, Swallow-bellied Mangalitsa, Moravka and Krskopolje pig). Animals from each of these breeds were genotyped using microsatellites and single nucleotide polymorphisms (SNPs). The results obtained with these two marker systems and those based on pedigree data were compared. In addition, we estimated inbreeding levels based on the distribution of runs of homozygosity (ROH) and identified genomic regions under selection pressure using ROH islands and the integrated haplotype score (iHS). RESULTS: The lowest heterozygosity values calculated from microsatellite and SNP data were observed in the Turopolje pig. The observed heterozygosity was higher than the expected heterozygosity in the Black Slavonian, Moravka and Turopolje pig. Both types of markers allowed us to distinguish clusters of individuals belonging to each breed. The analysis of admixture between breeds revealed potential gene flow between the Mangalitsa and Moravka, and between the Mangalitsa and Black Slavonian, but no introgression events were detected in the Banija spotted and Turopolje pig. The distribution of ROH across the genome was not uniform. Analysis of the ROH islands identified genomic regions with an extremely high frequency of shared ROH within the Swallow-bellied Mangalitsa, which harboured genes associated with cholesterol biosynthesis, fatty acid metabolism and daily weight gain. The iHS approach to detect signatures of selection revealed candidate regions containing genes with potential roles in reproduction traits and disease resistance. CONCLUSIONS: Based on the estimation of population parameters obtained from three data sets, we showed the existence of relationships among the six pig breeds analysed here. Analysis of the distribution of ROH allowed us to estimate the level of inbreeding and the extent of homozygous regions in these breeds. The iHS analysis revealed genomic regions potentially associated with phenotypic traits and allowed the detection of genomic regions under selection pressure.
Assuntos
Endogamia , Polimorfismo de Nucleotídeo Único , Animais , Croácia , Sérvia , Eslovênia , Suínos/genéticaRESUMO
Runs of homozygosity (ROH) are defined as long stretches of DNA homozygous at each polymorphic position. The proportion of genome covered by ROH and their length are indicators of the level and origin of inbreeding. In this study, we analysed SNP chip datasets (obtained using the Axiom OrcunSNP Array) of a total of 702 rabbits from 12 fancy breeds and four meat breeds to identify ROH with different approaches and calculate several genomic inbreeding parameters. The highest average number of ROH per animal was detected in Belgian Hare (~150) and the lowest in Italian Silver (~106). The average length of ROH ranged from 4.001 ± 0.556 Mb in Italian White to 6.268 ± 1.355 Mb in Ermine. The same two breeds had the lowest (427.9 ± 86.4 Mb, Italian White) and the highest (921.3 ± 179.8 Mb, Ermine) average values of the sum of all ROH segments. More fancy breeds had a higher level of genomic inbreeding (as defined by ROH) than meat breeds. Several ROH islands contain genes involved in body size, body length, pigmentation processes, carcass traits, growth, and reproduction traits (e.g.: AOX1, GPX5, IFRD1, ITGB8, NELL1, NR3C1, OCA2, TRIB1, TRIB2). Genomic inbreeding parameters can be useful to overcome the lack of information in the management of rabbit genetic resources. ROH provided information to understand, to some extent, the genetic history of rabbit breeds and to identify signatures of selection in the rabbit genome.
Assuntos
Endogamia , Polimorfismo de Nucleotídeo Único , Coelhos , Animais , Ilhas , Homozigoto , Genômica , Carne , GenótipoRESUMO
Reggiana and Modenese are autochthonous cattle breeds, reared in the North of Italy, that can be mainly distinguished for their standard coat color (Reggiana is red, whereas Modenese is white with some pale gray shades). Almost all milk produced by these breeds is transformed into 2 mono-breed branded Parmigiano-Reggiano cheeses, from which farmers receive the economic incomes needed for the sustainable conservation of these animal genetic resources. After the setting up of their herd books in 1960s, these breeds experienced a strong reduction in the population size that was subsequently reverted starting in the 1990s (Reggiana) or more recently (Modenese) reaching at present a total of about 2,800 and 500 registered cows, respectively. Due to the small population size of these breeds, inbreeding is a very important cause of concern for their conservation programs. Inbreeding is traditionally estimated using pedigree data, which are summarized in an inbreeding coefficient calculated at the individual level (FPED). However, incompleteness of pedigree information and registration errors can affect the effectiveness of conservation strategies. High-throughput SNP genotyping platforms allow investigation of inbreeding using genome information that can overcome the limits of pedigree data. Several approaches have been proposed to estimate genomic inbreeding, with the use of runs of homozygosity (ROH) considered to be the more appropriate. In this study, several pedigree and genomic inbreeding parameters, calculated using the whole herd book populations or considering genotyping information (GeneSeek GGP Bovine 150K) from 1,684 Reggiana cattle and 323 Modenese cattle, were compared. Average inbreeding values per year were used to calculate effective population size. Reggiana breed had generally lower genomic inbreeding values than Modenese breed. The low correlation between pedigree-based and genomic-based parameters (ranging from 0.187 to 0.195 and 0.319 to 0.323 in the Reggiana and Modenese breeds, respectively) reflected the common problems of local populations in which pedigree records are not complete. The high proportion of short ROH over the total number of ROH indicates no major recent inbreeding events in both breeds. ROH islands spread over the genome of the 2 breeds (15 in Reggiana and 14 in Modenese) identified several signatures of selection. Some of these included genes affecting milk production traits, stature, body conformation traits (with a main ROH island in both breeds on BTA6 containing the ABCG2, NCAPG, and LCORL genes) and coat color (on BTA13 in Modenese containing the ASIP gene). In conclusion, this work provides an extensive comparative analysis of pedigree and genomic inbreeding parameters and relevant genomic information that will be useful in the conservation strategies of these 2 iconic local cattle breeds.
Assuntos
Endogamia , Polimorfismo de Nucleotídeo Único , Animais , Bovinos/genética , Feminino , Genótipo , Homozigoto , Ilhas , ItáliaRESUMO
Autochthonous cattle breeds are genetic resources that, in many cases, have been fixed for inheritable exterior phenotypes useful to understand the genetic mechanisms affecting these breed-specific traits. Reggiana and Modenese are two closely related autochthonous cattle breeds mainly raised in the production area of the well-known Protected Designation of Origin Parmigiano-Reggiano cheese, in the North of Italy. These breeds can be mainly distinguished for their standard coat colour: solid red in Reggiana and solid white with pale shades of grey in Modenese. In this study we genotyped with the GeneSeek GGP Bovine 150k single nucleotide polymorphism (SNP) chip almost half of the extant cattle populations of Reggiana (n = 1109 and Modenese (n = 326) and used genome-wide information in comparative FST analyses to detect signatures of selection that diverge between these two autochthonous breeds. The two breeds could be clearly distinguished using multidimensional scaling plots and admixture analysis. Considering the top 0.0005% FST values, a total of 64 markers were detected in the single-marker analysis. The top FST value was detected for the melanocortin 1 receptor (MC1R) gene mutation, which determines the red coat colour of the Reggiana breed. Another coat colour gene, agouti signalling protein (ASIP), emerged amongst this list of top SNPs. These results were also confirmed with the window-based analyses, which included 0.5-Mb or 1-Mb genome regions. As variability affecting ASIP has been associated with white coat colour in sheep and goats, these results highlighted this gene as a strong candidate affecting coat colour in Modenese breed. This study demonstrates how population genomic approaches designed to take advantage from the diversity between local genetic resources could provide interesting hints to explain exterior traits not yet completely investigated in cattle.
Assuntos
Genoma , Polimorfismo de Nucleotídeo Único , Animais , Bovinos/genética , Cor , Genótipo , Itália , Fenótipo , Ovinos/genéticaRESUMO
Omics techniques provide a spectrum of information at the genomic level, whose analysis can characterize complex traits at a molecular level. The relationship among genotype and phenotype implies that from genome information the molecular pathways and biological processes underlying a given phenotype are discovered. In dealing with this problem, gene enrichment analysis has become the most widely adopted strategy. Here we present NETGE-PLUS, a Web server for standard and network-based functional interpretation of gene sets of human and of model organisms, including Sus scrofa, Saccharomyces cerevisiae, Escherichia coli, and Arabidopsis thaliana. NETGE-PLUS enables the functional enrichment of both simple and ranked lists of genes, introducing also the possibility of exploring relationships among KEGG pathways. A Web interface makes data retrieval complete and user-friendly. NETGE-PLUS is publicly available at http://net-ge2.biocomp.unibo.it.
Assuntos
Arabidopsis , Software , Arabidopsis/genética , Bases de Dados Genéticas , Genômica , Humanos , Armazenamento e Recuperação da Informação , Internet , ProbabilidadeRESUMO
BACKGROUND: Natural and artificial directional selection in cosmopolitan and autochthonous pig breeds and wild boars have shaped their genomes and resulted in a reservoir of animal genetic diversity. Signatures of selection are the result of these selection events that have contributed to the adaptation of breeds to different environments and production systems. In this study, we analysed the genome variability of 19 European autochthonous pig breeds (Alentejana, Bísara, Majorcan Black, Basque, Gascon, Apulo-Calabrese, Casertana, Cinta Senese, Mora Romagnola, Nero Siciliano, Sarda, Krskopolje pig, Black Slavonian, Turopolje, Moravka, Swallow-Bellied Mangalitsa, Schwäbisch-Hällisches Schwein, Lithuanian indigenous wattle and Lithuanian White old type) from nine countries, three European commercial breeds (Italian Large White, Italian Landrace and Italian Duroc), and European wild boars, by mining whole-genome sequencing data obtained by using a DNA-pool sequencing approach. Signatures of selection were identified by using a single-breed approach with two statistics [within-breed pooled heterozygosity (HP) and fixation index (FST)] and group-based FST approaches, which compare groups of breeds defined according to external traits and use/specialization/type. RESULTS: We detected more than 22 million single nucleotide polymorphisms (SNPs) across the 23 compared populations and identified 359 chromosome regions showing signatures of selection. These regions harbour genes that are already known or new genes that are under selection and relevant for the domestication process in this species, and that affect several morphological and physiological traits (e.g. coat colours and patterns, body size, number of vertebrae and teats, ear size and conformation, reproductive traits, growth and fat deposition traits). Wild boar related signatures of selection were detected across all the genome of several autochthonous breeds, which suggests that crossbreeding (accidental or deliberate) occurred with wild boars. CONCLUSIONS: Our findings provide a catalogue of genetic variants of many European pig populations and identify genome regions that can explain, at least in part, the phenotypic diversity of these genetic resources.
Assuntos
Técnicas de Genotipagem/métodos , Seleção Genética/genética , Suínos/genética , Aclimatação/genética , Adaptação Fisiológica/genética , Algoritmos , Animais , Cruzamento , Domesticação , Europa (Continente) , Feminino , Genoma/genética , Genômica/métodos , Genótipo , Masculino , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Sequenciamento Completo do Genoma/métodosRESUMO
In silico approaches are routinely adopted to predict the effects of genetic variants and their relation to diseases. The critical assessment of genome interpretation (CAGI) has established a common framework for the assessment of available predictors of variant effects on specific problems and our group has been an active participant of CAGI since its first edition. In this paper, we summarize our experience and lessons learned from the last edition of the experiment (CAGI-5). In particular, we analyze prediction performances of our tools on five CAGI-5 selected challenges grouped into three different categories: prediction of variant effects on protein stability, prediction of variant pathogenicity, and prediction of complex functional effects. For each challenge, we analyze in detail the performance of our tools, highlighting their potentialities and drawbacks. The aim is to better define the application boundaries of each tool.
Assuntos
Biologia Computacional/métodos , Variação Genética , Proteínas/química , Proteínas/genética , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Aprendizado de Máquina , Fenótipo , Estabilidade ProteicaRESUMO
Frataxin (FXN) is a highly conserved protein found in prokaryotes and eukaryotes that is required for efficient regulation of cellular iron homeostasis. Experimental evidence associates amino acid substitutions of the FXN to Friedreich Ataxia, a neurodegenerative disorder. Recently, new thermodynamic experiments have been performed to study the impact of somatic variations identified in cancer tissues on protein stability. The Critical Assessment of Genome Interpretation (CAGI) data provider at the University of Rome measured the unfolding free energy of a set of variants (FXN challenge data set) with far-UV circular dichroism and intrinsic fluorescence spectra. These values have been used to calculate the change in unfolding free energy between the variant and wild-type proteins at zero concentration of denaturant (ΔΔGH2O) . The FXN challenge data set, composed of eight amino acid substitutions, was used to evaluate the performance of the current computational methods for predicting the ΔΔGH2O value associated with the variants and to classify them as destabilizing and not destabilizing. For the fifth edition of CAGI, six independent research groups from Asia, Australia, Europe, and North America submitted 12 sets of predictions from different approaches. In this paper, we report the results of our assessment and discuss the limitations of the tested algorithms.
Assuntos
Substituição de Aminoácidos , Proteínas de Ligação ao Ferro/química , Proteínas de Ligação ao Ferro/genética , Algoritmos , Dicroísmo Circular , Humanos , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Estabilidade Proteica , FrataxinaRESUMO
Genetics play a key role in venous thromboembolism (VTE) risk, however established risk factors in European populations do not translate to individuals of African descent because of the differences in allele frequencies between populations. As part of the fifth iteration of the Critical Assessment of Genome Interpretation, participants were asked to predict VTE status in exome data from African American subjects. Participants were provided with 103 unlabeled exomes from patients treated with warfarin for non-VTE causes or VTE and asked to predict which disease each subject had been treated for. Given the lack of training data, many participants opted to use unsupervised machine learning methods, clustering the exomes by variation in genes known to be associated with VTE. The best performing method using only VTE related genes achieved an area under the ROC curve of 0.65. Here, we discuss the range of methods used in the prediction of VTE from sequence data and explore some of the difficulties of conducting a challenge with known confounders. In addition, we show that an existing genetic risk score for VTE that was developed in European subjects works well in African Americans.
Assuntos
Sequenciamento do Exoma/métodos , Tromboembolia Venosa/genética , Varfarina/administração & dosagem , Análise por Conglomerados , Biologia Computacional/métodos , Congressos como Assunto , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Curva ROC , Aprendizado de Máquina não Supervisionado , Tromboembolia Venosa/tratamento farmacológico , Varfarina/uso terapêuticoRESUMO
Honey contains DNA from many different organisms that are part of hive micro-environmental niches and honey bee pathospheres. In this study, we recovered and sequenced mite mitochondrial DNA (mtDNA) from honey from different locations around the world (Europe, Asia, Africa, North and South America). DNA extracted from 17 honey samples was amplified with eight primer pairs targeting three mite mtDNA genes, obtaining 88 amplicons that were sequenced with an Ion Torrent sequencing platform. A bioinformatic pipeline compared produced reads with Varroa spp. mtDNA sequence entries available in GenBank and assigned them to different mitotypes. In all honey samples, the highest percentage of reads was attributed to the K1 lineage, including a few variants derived from it, in addition to J1 reads observed in the two South American samples and C1-1 reads obtained from the Chinese honey. This study opens new possibilities to analyse mite lineages and variants and monitor their geographical and temporal distribution, simplifying surveillance against this damaging honey bee parasite.
Assuntos
Abelhas/parasitologia , DNA Ambiental/análise , Sequenciamento de Nucleotídeos em Larga Escala , Mel/análise , Varroidae , Animais , DNA Mitocondrial , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mel/parasitologia , Varroidae/genéticaRESUMO
The Critical Assessment of Genome Interpretation (CAGI) is a global community experiment to objectively assess computational methods for predicting phenotypic impacts of genomic variation. One of the 2015-2016 competitions focused on predicting the influence of mutations on the allosteric regulation of human liver pyruvate kinase. More than 30 different researchers accessed the challenge data. However, only four groups accepted the challenge. Features used for predictions ranged from evolutionary constraints, mutant site locations relative to active and effector binding sites, and computational docking outputs. Despite the range of expertise and strategies used by predictors, the best predictions were marginally greater than random for modified allostery resulting from mutations. In contrast, several groups successfully predicted which mutations severely reduced enzymatic activity. Nonetheless, poor predictions of allostery stands in stark contrast to the impression left by more than 700 PubMed entries identified using the identifiers "computational + allosteric." This contrast highlights a specialized need for new computational tools and utilization of benchmarks that focus on allosteric regulation.