Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 24
Filter
Add more filters










Publication year range
1.
PLoS One ; 9(7): e103046, 2014.
Article in English | MEDLINE | ID: mdl-25050984

ABSTRACT

Genomic structural variations represent an important source of genetic variation in mammal genomes, thus, they are commonly related to phenotypic expressions. In this work, ∼ 770,000 single nucleotide polymorphism genotypes from 506 animals from 19 cattle breeds were analyzed. A simple LD-based structural variation was defined, and a genome-wide analysis was performed. After applying some quality control filters, for each breed and each chromosome we calculated the linkage disequilibrium (r2) of short range (≤ 100 Kb). We sorted SNP pairs by distance and obtained a set of LD means (called the expected means) using bins of 5 Kb. We identified 15,246 segments of at least 1 Kb, among the 19 breeds, consisting of sets of at least 3 adjacent SNPs so that, for each SNP, r2 within its neighbors in a 100 Kb range, to the right side of that SNP, were all bigger than, or all smaller than, the corresponding expected mean, and their P-value were significant after a Benjamini-Hochberg multiple testing correction. In addition, to account just for homogeneously distributed regions we considered only SNPs having at least 15 SNP neighbors within 100 Kb. We defined such segments as structural variations. By grouping all variations across all animals in the sample we defined 9,146 regions, involving a total of 53,137 SNPs; representing the 6.40% (160.98 Mb) from the bovine genome. The identified structural variations covered 3,109 genes. Clustering analysis showed the relatedness of breeds given the geographic region in which they are evolving. In summary, we present an analysis of structural variations based on the deviation of the expected short range LD between SNPs in the bovine genome. With an intuitive and simple definition based only on SNPs data it was possible to discern closeness of breeds due to grouping by geographic region in which they are evolving.


Subject(s)
Breeding , Cattle/genetics , Linkage Disequilibrium , Polymorphism, Single Nucleotide , Animals , Female , Gene Frequency , Genome , Genome-Wide Association Study , Genotype , Haplotypes , Male
2.
Genome Res ; 22(4): 778-90, 2012 Apr.
Article in English | MEDLINE | ID: mdl-22300768

ABSTRACT

Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one Holstein, and one Hereford) and one indicine (Nelore) cattle. Within mapped chromosomal sequence, we identified 1265 CNV regions comprising ~55.6-Mbp sequence--476 of which (~38%) have not previously been reported. We validated this sequence-based CNV call set with array comparative genomic hybridization (aCGH), quantitative PCR (qPCR), and fluorescent in situ hybridization (FISH), achieving a validation rate of 82% and a false positive rate of 8%. We further estimated absolute copy numbers for genomic segments and annotated genes in each individual. Surveys of the top 25 most variable genes revealed that the Nelore individual had the lowest copy numbers in 13 cases (~52%, χ(2) test; P-value <0.05). In contrast, genes related to pathogen- and parasite-resistance, such as CATHL4 and ULBP17, were highly duplicated in the Nelore individual relative to the taurine cattle, while genes involved in lipid transport and metabolism, including APOL3 and FABP2, were highly duplicated in the beef breeds. These CNV regions also harbor genes like BPIFA2A (BSP30A) and WC1, suggesting that some CNVs may be associated with breed-specific differences in adaptation, health, and production traits. By providing the first individualized cattle CNV and segmental duplication maps and genome-wide gene copy number estimates, we enable future CNV studies into highly duplicated regions in the cattle genome.


Subject(s)
Cattle/genetics , DNA Copy Number Variations , Genome/genetics , Sequence Analysis, DNA/methods , Animals , Cattle/classification , Chromosome Mapping , Chromosomes, Mammalian/genetics , Comparative Genomic Hybridization , Fatty Acid-Binding Proteins/genetics , Fatty Acid-Binding Proteins/metabolism , Female , Gene Dosage , Gene Duplication , Genomics/methods , In Situ Hybridization, Fluorescence , Male , Polymerase Chain Reaction , Species Specificity
3.
Funct Integr Genomics ; 12(1): 81-92, 2012 Mar.
Article in English | MEDLINE | ID: mdl-21928070

ABSTRACT

Genomic structural variation is an important and abundant source of genetic and phenotypic variation. We previously reported an initial analysis of copy number variations (CNVs) in Angus cattle selected for resistance or susceptibility to gastrointestinal nematodes. In this study, we performed a large-scale analysis of CNVs using SNP genotyping data from 472 animals of the same population. We detected 811 candidate CNV regions, which represent 141.8 Mb (~4.7%) of the genome. To investigate the functional impacts of CNVs, we created 2 groups of 100 individual animals with extremely low or high estimated breeding values of eggs per gram of feces and referred to these groups as parasite resistant (PR) or parasite susceptible (PS), respectively. We identified 297 (~51 Mb) and 282 (~48 Mb) CNV regions from PR and PS groups, respectively. Approximately 60% of the CNV regions were specific to the PS group or PR group of animals. Selected PR- or PS-specific CNVs were further experimentally validated by quantitative PCR. A total of 297 PR CNV regions overlapped with 437 Ensembl genes enriched in immunity and defense, like WC1 gene which uniquely expresses on gamma/delta T cells in cattle. Network analyses indicated that the PR-specific genes were predominantly involved in gastrointestinal disease, immunological disease, inflammatory response, cell-to-cell signaling and interaction, lymphoid tissue development, and cell death. By contrast, the 282 PS CNV regions contained 473 Ensembl genes which are overrepresented in environmental interactions. Network analyses indicated that the PS-specific genes were particularly enriched for inflammatory response, immune cell trafficking, metabolic disease, cell cycle, and cellular organization and movement.


Subject(s)
Cattle Diseases/genetics , DNA Copy Number Variations , Disease Resistance/genetics , Gastrointestinal Diseases/veterinary , Gastrointestinal Tract/parasitology , Nematode Infections/veterinary , Parasitic Diseases, Animal/genetics , Animals , Cattle , Feces/parasitology , Female , Gastrointestinal Diseases/genetics , Genetic Association Studies , Genetic Predisposition to Disease , Genome , Host-Parasite Interactions , Male , Nematoda/physiology , Nematode Infections/genetics
4.
BMC Genomics ; 12: 408, 2011 Aug 11.
Article in English | MEDLINE | ID: mdl-21831322

ABSTRACT

BACKGROUND: Genome-wide association analysis is a powerful tool for annotating phenotypic effects on the genome and knowledge of genes and chromosomal regions associated with dairy phenotypes is useful for genome and gene-based selection. Here, we report results of a genome-wide analysis of predicted transmitting ability (PTA) of 31 production, health, reproduction and body conformation traits in contemporary Holstein cows. RESULTS: Genome-wide association analysis identified a number of candidate genes and chromosome regions associated with 31 dairy traits in contemporary U.S. Holstein cows. Highly significant genes and chromosome regions include: BTA13's GNAS region for milk, fat and protein yields; BTA7's INSR region and BTAX's LOC520057 and GRIA3 for daughter pregnancy rate, somatic cell score and productive life; BTA2's LRP1B for somatic cell score; BTA14's DGAT1-NIBP region for fat percentage; BTA1's FKBP2 for protein yields and percentage, BTA26's MGMT and BTA6's PDGFRA for protein percentage; BTA18's 53.9-58.7 Mb region for service-sire and daughter calving ease and service-sire stillbirth; BTA18's PGLYRP1-IGFL1 region for a large number of traits; BTA18's LOC787057 for service-sire stillbirth and daughter calving ease; BTA15's CD82, BTA23's DST and the MOCS1-LRFN2 region for daughter stillbirth; and BTAX's LOC520057 and GRIA3 for daughter pregnancy rate. For body conformation traits, BTA11, BTAX, BTA10, BTA5, and BTA26 had the largest concentrations of SNP effects, and PHKA2 of BTAX and REN of BTA16 had the most significant effects for body size traits. For body shape traits, BTAX, BTA19 and BTA3 were most significant. Udder traits were affected by BTA16, BTA22, BTAX, BTA2, BTA10, BTA11, BTA20, BTA22 and BTA25, teat traits were affected by BTA6, BTA7, BTA9, BTA16, BTA11, BTA26 and BTA17, and feet/legs traits were affected by BTA11, BTA13, BTA18, BTA20, and BTA26. CONCLUSIONS: Genome-wide association analysis identified a number of genes and chromosome regions associated with 31 production, health, reproduction and body conformation traits in contemporary Holstein cows. The results provide useful information for annotating phenotypic effects on the dairy genome and for building consensus of dairy QTL effects.


Subject(s)
Body Constitution , Cattle/genetics , Genetic Association Studies , Quantitative Trait, Heritable , Animals , Dairying , Female , Genotype , Milk , Phenotype , Polymorphism, Single Nucleotide , Pregnancy , Quantitative Trait Loci , Reproduction/genetics
5.
BMC Genomics ; 12: 127, 2011 Feb 23.
Article in English | MEDLINE | ID: mdl-21345189

ABSTRACT

BACKGROUND: Copy number variation (CNV) represents another important source of genetic variation complementary to single nucleotide polymorphism (SNP). High-density SNP array data have been routinely used to detect human CNVs, many of which have significant functional effects on gene expression and human diseases. In the dairy industry, a large quantity of SNP genotyping results are becoming available and can be used for CNV discovery to understand and accelerate genetic improvement for complex traits. RESULTS: We performed a systematic analysis of CNV using the Bovine HapMap SNP genotyping data, including 539 animals of 21 modern cattle breeds and 6 outgroups. After correcting genomic waves and considering the pedigree information, we identified 682 candidate CNV regions, which represent 139.8 megabases (~4.60%) of the genome. Selected CNVs were further experimentally validated and we found that copy number "gain" CNVs were predominantly clustered in tandem rather than existing as interspersed duplications. Many CNV regions (~56%) overlap with cattle genes (1,263), which are significantly enriched for immunity, lactation, reproduction and rumination. The overlap of this new dataset and other published CNV studies was less than 40%; however, our discovery of large, high frequency (> 5% of animals surveyed) CNV regions showed 90% agreement with other studies. These results highlight the differences and commonalities between technical platforms. CONCLUSIONS: We present a comprehensive genomic analysis of cattle CNVs derived from SNP data which will be a valuable genomic variation resource. Combined with SNP detection assays, gene-containing CNV regions may help identify genes undergoing artificial selection in domesticated animals.


Subject(s)
Cattle/genetics , Gene Dosage , Polymorphism, Single Nucleotide , Animals , Breeding , Comparative Genomic Hybridization , Genetic Markers , Genome , Genomics/methods , Genotype , Pedigree , Sequence Analysis, DNA
6.
Genome Biol ; 11(10): R102, 2010.
Article in English | MEDLINE | ID: mdl-20961407

ABSTRACT

BACKGROUND: A comprehensive transcriptome survey, or gene atlas, provides information essential for a complete understanding of the genomic biology of an organism. We present an atlas of RNA abundance for 92 adult, juvenile and fetal cattle tissues and three cattle cell lines. RESULTS: The Bovine Gene Atlas was generated from 7.2 million unique digital gene expression tag sequences (300.2 million total raw tag sequences), from which 1.59 million unique tag sequences were identified that mapped to the draft bovine genome accounting for 85% of the total raw tag abundance. Filtering these tags yielded 87,764 unique tag sequences that unambiguously mapped to 16,517 annotated protein-coding loci in the draft genome accounting for 45% of the total raw tag abundance. Clustering of tissues based on tag abundance profiles generally confirmed ontology classification based on anatomy. There were 5,429 constitutively expressed loci and 3,445 constitutively expressed unique tag sequences mapping outside annotated gene boundaries that represent a resource for enhancing current gene models. Physical measures such as inferred transcript length or antisense tag abundance identified tissues with atypical transcriptional tag profiles. We report for the first time the tissue-specific variation in the proportion of mitochondrial transcriptional tag abundance. CONCLUSIONS: The Bovine Gene Atlas is the deepest and broadest transcriptome survey of any livestock genome to date. Commonalities and variation in sense and antisense transcript tag profiles identified in different tissues facilitate the examination of the relationship between gene expression, tissue, and gene function.


Subject(s)
Cattle/genetics , Expressed Sequence Tags , Genome , Molecular Sequence Annotation , Animals , Cattle/classification , Cell Line , Chromosome Mapping , Female , Gene Expression , Gene Expression Profiling , Genes, Mitochondrial , Male , Molecular Sequence Annotation/methods , Proteomics
7.
Genome Res ; 20(5): 693-703, 2010 May.
Article in English | MEDLINE | ID: mdl-20212021

ABSTRACT

Genomic structural variation is an important and abundant source of genetic and phenotypic variation. Here, we describe the first systematic and genome-wide analysis of copy number variations (CNVs) in modern domesticated cattle using array comparative genomic hybridization (array CGH), quantitative PCR (qPCR), and fluorescent in situ hybridization (FISH). The array CGH panel included 90 animals from 11 Bos taurus, three Bos indicus, and three composite breeds for beef, dairy, or dual purpose. We identified over 200 candidate CNV regions (CNVRs) in total and 177 within known chromosomes, which harbor or are adjacent to gains or losses. These 177 high-confidence CNVRs cover 28.1 megabases or approximately 1.07% of the genome. Over 50% of the CNVRs (89/177) were found in multiple animals or breeds and analysis revealed breed-specific frequency differences and reflected aspects of the known ancestry of these cattle breeds. Selected CNVs were further validated by independent methods using qPCR and FISH. Approximately 67% of the CNVRs (119/177) completely or partially span cattle genes and 61% of the CNVRs (108/177) directly overlap with segmental duplications. The CNVRs span about 400 annotated cattle genes that are significantly enriched for specific biological functions, such as immunity, lactation, reproduction, and rumination. Multiple gene families, including ULBP, have gone through ruminant lineage-specific gene amplification. We detected and confirmed marked differences in their CNV frequencies across diverse breeds, indicating that some cattle CNVs are likely to arise independently in breeds and contribute to breed differences. Our results provide a valuable resource beyond microsatellites and single nucleotide polymorphisms to explore the full dimension of genetic variability for future cattle genomic research.


Subject(s)
Cattle/classification , Cattle/genetics , DNA Copy Number Variations , Gene Dosage , Animals , Breeding , Comparative Genomic Hybridization , Genetics, Population , Genome , Genomic Structural Variation , Genomics , In Situ Hybridization, Fluorescence , Oligonucleotide Array Sequence Analysis , Polymerase Chain Reaction/methods , Segmental Duplications, Genomic , Species Specificity
8.
Science ; 324(5926): 528-32, 2009 Apr 24.
Article in English | MEDLINE | ID: mdl-19390050

ABSTRACT

The imprints of domestication and breed development on the genomes of livestock likely differ from those of companion animals. A deep draft sequence assembly of shotgun reads from a single Hereford female and comparative sequences sampled from six additional breeds were used to develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from 19 geographically and biologically diverse breeds. These data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation. Domestication and artificial selection appear to have left detectable signatures of selection within the cattle genome, yet the current levels of diversity within breeds are at least as great as exists within humans.


Subject(s)
Cattle/genetics , Genetic Variation , Genome , Polymorphism, Single Nucleotide , Animals , Breeding , Female , Gene Frequency , Male , Molecular Sequence Data , Mutation , Population Density
9.
BMC Genet ; 10: 19, 2009 Apr 24.
Article in English | MEDLINE | ID: mdl-19393054

ABSTRACT

BACKGROUND: The Bovine HapMap Consortium has generated assay panels to genotype ~30,000 single nucleotide polymorphisms (SNPs) from 501 animals sampled from 19 worldwide taurine and indicine breeds, plus two outgroup species (Anoa and Water Buffalo). Within the larger set of SNPs we targeted 101 high density regions spanning up to 7.6 Mb with an average density of approximately one SNP per 4 kb, and characterized the linkage disequilibrium (LD) and haplotype block structure within individual breeds and groups of breeds in relation to their geographic origin and use. RESULTS: From the 101 targeted high-density regions on bovine chromosomes 6, 14, and 25, between 57 and 95% of the SNPs were informative in the individual breeds. The regions of high LD extend up to ~100 kb and the size of haplotype blocks ranges between 30 bases and 75 kb (10.3 kb average). On the scale from 1-100 kb the extent of LD and haplotype block structure in cattle has high similarity to humans. The estimation of effective population sizes over the previous 10,000 generations conforms to two main events in cattle history: the initiation of cattle domestication (~12,000 years ago), and the intensification of population isolation and current population bottleneck that breeds have experienced worldwide within the last ~700 years. Haplotype block density correlation, block boundary discordances, and haplotype sharing analyses were consistent in revealing unexpected similarities between some beef and dairy breeds, making them non-differentiable. Clustering techniques permitted grouping of breeds into different clades given their similarities and dissimilarities in genetic structure. CONCLUSION: This work presents the first high-resolution analysis of haplotype block structure in worldwide cattle samples. Several novel results were obtained. First, cattle and human share a high similarity in LD and haplotype block structure on the scale of 1-100 kb. Second, unexpected similarities in haplotype block structure between dairy and beef breeds make them non-differentiable. Finally, our findings suggest that ~30,000 uniformly distributed SNPs would be necessary to construct a complete genome LD map in Bos taurus breeds, and ~580,000 SNPs would be necessary to characterize the haplotype block structure across the complete cattle genome.


Subject(s)
Algorithms , Cattle/genetics , Genome/genetics , Haplotypes , Animals , Breeding , Cattle/classification , Cluster Analysis , Female , Gene Frequency , Genotype , Linkage Disequilibrium , Male , Phylogeny , Polymorphism, Single Nucleotide
10.
PLoS One ; 4(4): e5350, 2009.
Article in English | MEDLINE | ID: mdl-19390634

ABSTRACT

The success of genome-wide association (GWA) studies for the detection of sequence variation affecting complex traits in human has spurred interest in the use of large-scale high-density single nucleotide polymorphism (SNP) genotyping for the identification of quantitative trait loci (QTL) and for marker-assisted selection in model and agricultural species. A cost-effective and efficient approach for the development of a custom genotyping assay interrogating 54,001 SNP loci to support GWA applications in cattle is described. A novel algorithm for achieving a compressed inter-marker interval distribution proved remarkably successful, with median interval of 37 kb and maximum predicted gap of <350 kb. The assay was tested on a panel of 576 animals from 21 cattle breeds and six outgroup species and revealed that from 39,765 to 46,492 SNP are polymorphic within individual breeds (average minor allele frequency (MAF) ranging from 0.24 to 0.27). The assay also identified 79 putative copy number variants in cattle. Utility for GWA was demonstrated by localizing known variation for coat color and the presence/absence of horns to their correct genomic locations. The combination of SNP selection and the novel spacing algorithm allows an efficient approach for the development of high-density genotyping platforms in species having full or even moderate quality draft sequence. Aspects of the approach can be exploited in species which lack an available genome sequence. The BovineSNP50 assay described here is commercially available from Illumina and provides a robust platform for mapping disease genes and QTL in cattle.


Subject(s)
Cattle/genetics , Computational Biology/methods , Genotype , Polymorphism, Single Nucleotide/genetics , Animals , Chromosomes, Artificial, Bacterial/genetics , Gene Frequency , Genome , Genome-Wide Association Study , Quantitative Trait Loci
11.
BMC Genomics ; 10: 77, 2009 Feb 10.
Article in English | MEDLINE | ID: mdl-19208255

ABSTRACT

BACKGROUND: MicroRNA (miR) are a class of small RNAs that regulate gene expression by inhibiting translation of protein encoding transcripts. To evaluate the role of miR in skeletal muscle of swine, global microRNA abundance was measured at specific developmental stages including proliferating satellite cells, three stages of fetal growth, day-old neonate, and the adult. RESULTS: Twelve potential novel miR were detected that did not match previously reported sequences. In addition, a number of miR previously reported to be expressed in mammalian muscle were detected, having a variety of abundance patterns through muscle development. Muscle-specific miR-206 was nearly absent in proliferating satellite cells in culture, but was the highest abundant miR at other time points evaluated. In addition, miR-1 was moderately abundant throughout developmental stages with highest abundance in the adult. In contrast, miR-133 was moderately abundant in adult muscle and either not detectable or lowly abundant throughout fetal and neonate development. Changes in abundance of ubiquitously expressed miR were also observed. MiR-432 abundance was highest at the earliest stage of fetal development tested (60 day-old fetus) and decreased throughout development to the adult. Conversely, miR-24 and miR-27 exhibited greatest abundance in proliferating satellite cells and the adult, while abundance of miR-368, miR-376, and miR-423-5p was greatest in the neonate. CONCLUSION: These data present a complete set of transcriptome profiles to evaluate miR abundance at specific stages of skeletal muscle growth in swine. Identification of these miR provides an initial group of miR that may play a vital role in muscle development and growth.


Subject(s)
Gene Expression Profiling , MicroRNAs/genetics , Muscle Development , Muscle, Skeletal/metabolism , Swine/genetics , Animals , Female , Gene Expression Regulation, Developmental , Gene Library , Male , Oligonucleotide Array Sequence Analysis , Swine/growth & development
12.
BMC Genet ; 9: 37, 2008 May 20.
Article in English | MEDLINE | ID: mdl-18492244

ABSTRACT

BACKGROUND: Analyses of population structure and breed diversity have provided insight into the origin and evolution of cattle. Previously, these studies have used a low density of microsatellite markers, however, with the large number of single nucleotide polymorphism markers that are now available, it is possible to perform genome wide population genetic analyses in cattle. In this study, we used a high-density panel of SNP markers to examine population structure and diversity among eight cattle breeds sampled from Bos indicus and Bos taurus. RESULTS: Two thousand six hundred and forty one single nucleotide polymorphisms (SNPs) spanning all of the bovine autosomal genome were genotyped in Angus, Brahman, Charolais, Dutch Black and White Dairy, Holstein, Japanese Black, Limousin and Nelore cattle. Population structure was examined using the linkage model in the program STRUCTURE and Fst estimates were used to construct a neighbor-joining tree to represent the phylogenetic relationship among these breeds. CONCLUSION: The whole-genome SNP panel identified several levels of population substructure in the set of examined cattle breeds. The greatest level of genetic differentiation was detected between the Bos taurus and Bos indicus breeds. When the Bos indicus breeds were excluded from the analysis, genetic differences among beef versus dairy and European versus Asian breeds were detected among the Bos taurus breeds. Exploration of the number of SNP loci required to differentiate between breeds showed that for 100 SNP loci, individuals could only be correctly clustered into breeds 50% of the time, thus a large number of SNP markers are required to replace the 30 microsatellite markers that are currently commonly used in genetic diversity studies.


Subject(s)
Cattle/genetics , Genome/genetics , Polymorphism, Single Nucleotide , Analysis of Variance , Animals , Crosses, Genetic , Genetic Markers , Genetics, Population , Genotype , Phylogeny
13.
Nat Methods ; 5(3): 247-52, 2008 Mar.
Article in English | MEDLINE | ID: mdl-18297082

ABSTRACT

High-density single-nucleotide polymorphism (SNP) arrays have revolutionized the ability of genome-wide association studies to detect genomic regions harboring sequence variants that affect complex traits. Extensive numbers of validated SNPs with known allele frequencies are essential to construct genotyping assays with broad utility. We describe an economical, efficient, single-step method for SNP discovery, validation and characterization that uses deep sequencing of reduced representation libraries (RRLs) from specified target populations. Using nearly 50 million sequences generated on an Illumina Genome Analyzer from DNA of 66 cattle representing three populations, we identified 62,042 putative SNPs and predicted their allele frequencies. Genotype data for these 66 individuals validated 92% of 23,357 selected genome-wide SNPs, with a genotypic and sequence allele frequency correlation of r = 0.67. This approach for simultaneous de novo discovery of high-quality SNPs and population characterization of allele frequencies may be applied to any species with at least a partially sequenced genome.


Subject(s)
Computational Biology/methods , Gene Frequency , Polymorphism, Single Nucleotide , Sequence Analysis, DNA/methods , Animals , Cattle , Genomic Library , Genotype
14.
Theor Appl Genet ; 116(7): 945-52, 2008 May.
Article in English | MEDLINE | ID: mdl-18278477

ABSTRACT

Large numbers of single nucleotide polymorphism (SNP) markers are now available for a number of crop species. However, the high-throughput methods for multiplexing SNP assays are untested in complex genomes, such as soybean, that have a high proportion of paralogous genes. The Illumina GoldenGate assay is capable of multiplexing from 96 to 1,536 SNPs in a single reaction over a 3-day period. We tested the GoldenGate assay in soybean to determine the success rate of converting verified SNPs into working assays. A custom 384-SNP GoldenGate assay was designed using SNPs that had been discovered through the resequencing of five diverse accessions that are the parents of three recombinant inbred line (RIL) mapping populations. The 384 SNPs that were selected for this custom assay were predicted to segregate in one or more of the RIL mapping populations. Allelic data were successfully generated for 89% of the SNP loci (342 of the 384) when it was used in the three RIL mapping populations, indicating that the complex nature of the soybean genome had little impact on conversion of the discovered SNPs into usable assays. In addition, 80% of the 342 mapped SNPs had a minor allele frequency >10% when this assay was used on a diverse sample of Asian landrace germplasm accessions. The high success rate of the GoldenGate assay makes this a useful technique for quickly creating high density genetic maps in species where SNP markers are rapidly becoming available.


Subject(s)
Genome, Plant , Glycine max/genetics , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide , Chromosome Mapping , Chromosomes, Plant , DNA, Plant , Genetic Markers , Genotype , Microsatellite Repeats
15.
Genomics Proteomics Bioinformatics ; 6(3-4): 129-43, 2008 Dec.
Article in English | MEDLINE | ID: mdl-19329064

ABSTRACT

A systematic phylogenetic footprinting approach was performed to identify conserved transcription factor binding sites (TFBSs) in mammalian promoter regions using human, mouse and rat sequence alignments. We found that the score distributions of most binding site models did not follow the Gaussian distribution required by many statistical methods. Therefore, we performed an empirical test to establish the optimal threshold for each model. We gauged our computational predictions by comparing with previously known TFBSs in the PCK1 gene promoter of the cytosolic isoform of phosphoenolpyruvate carboxykinase, and achieved a sensitivity of 75% and a specificity of approximately 32%. Almost all known sites overlapped with predicted sites, and several new putative TFBSs were also identified. We validated a predicted SP1 binding site in the control of PCK1 transcription using gel shift and reporter assays. Finally, we applied our computational approach to the prediction of putative TFBSs within the promoter regions of all available RefSeq genes. Our full set of TFBS predictions is freely available at http://bfgl.anri.barc.usda.gov/tfbsConsSites.


Subject(s)
Intracellular Signaling Peptides and Proteins/genetics , Phosphoenolpyruvate Carboxykinase (GTP)/genetics , Promoter Regions, Genetic/genetics , Regulatory Sequences, Nucleic Acid/genetics , Algorithms , Amino Acid Sequence , Animals , Base Sequence , Binding Sites/genetics , Cell Line, Tumor , Computational Biology/methods , Conserved Sequence , Electrophoretic Mobility Shift Assay , Humans , Luciferases/genetics , Luciferases/metabolism , Mice , Normal Distribution , Oligonucleotides/genetics , Oligonucleotides/metabolism , Protein Binding , Rats , Recombinant Fusion Proteins/genetics , Recombinant Fusion Proteins/metabolism , Reproducibility of Results , Sp1 Transcription Factor/genetics , Sp1 Transcription Factor/metabolism , Transcription Factors/metabolism , Transfection
16.
BMC Genet ; 8: 74, 2007 Oct 25.
Article in English | MEDLINE | ID: mdl-17961247

ABSTRACT

BACKGROUND: Bovine whole genome linkage disequilibrium maps were constructed for eight breeds of cattle. These data provide fundamental information concerning bovine genome organization which will allow the design of studies to associate genetic variation with economically important traits and also provides background information concerning the extent of long range linkage disequilibrium in cattle. RESULTS: Linkage disequilibrium was assessed using r2 among all pairs of syntenic markers within eight breeds of cattle from the Bos taurus and Bos indicus subspecies. Bos taurus breeds included Angus, Charolais, Dutch Black and White Dairy, Holstein, Japanese Black and Limousin while Bos indicus breeds included Brahman and Nelore. Approximately 2670 markers spanning the entire bovine autosomal genome were used to estimate pairwise r2 values. We found that the extent of linkage disequilibrium is no more than 0.5 Mb in these eight breeds of cattle. CONCLUSION: Linkage disequilibrium in cattle has previously been reported to extend several tens of centimorgans. Our results, based on a much larger sample of marker loci and across eight breeds of cattle indicate that in cattle linkage disequilibrium persists over much more limited distances. Our findings suggest that 30,000-50,000 loci will be needed to conduct whole genome association studies in cattle.


Subject(s)
Cattle/genetics , Chromosome Mapping/methods , Genome , Linkage Disequilibrium , Animals , Gene Frequency , Genetic Markers , Haplotypes , Polymorphism, Single Nucleotide , Quantitative Trait Loci
17.
Genome Biol ; 8(8): R165, 2007.
Article in English | MEDLINE | ID: mdl-17697342

ABSTRACT

BACKGROUND: Cattle are important agriculturally and relevant as a model organism. Previously described genetic and radiation hybrid (RH) maps of the bovine genome have been used to identify genomic regions and genes affecting specific traits. Application of these maps to identify influential genetic polymorphisms will be enhanced by integration with each other and with bacterial artificial chromosome (BAC) libraries. The BAC libraries and clone maps are essential for the hybrid clone-by-clone/whole-genome shotgun sequencing approach taken by the bovine genome sequencing project. RESULTS: A bovine BAC map was constructed with HindIII restriction digest fragments of 290,797 BAC clones from animals of three different breeds. Comparative mapping of 422,522 BAC end sequences assisted with BAC map ordering and assembly. Genotypes and pedigree from two genetic maps and marker scores from three whole-genome RH panels were consolidated on a 17,254-marker composite map. Sequence similarity allowed integrating the BAC and composite maps with the bovine draft assembly (Btau3.1), establishing a comprehensive resource describing the bovine genome. Agreement between the marker and BAC maps and the draft assembly is high, although discrepancies exist. The composite and BAC maps are more similar than either is to the draft assembly. CONCLUSION: Further refinement of the maps and greater integration into the genome assembly process may contribute to a high quality assembly. The maps provide resources to associate phenotypic variation with underlying genomic variation, and are crucial resources for understanding the biology underpinning this important ruminant species so closely associated with humans.


Subject(s)
Chromosomes, Mammalian/genetics , Gene Order , Genome , Radiation Hybrid Mapping , Animals , Base Sequence , Cattle , Chromosomes, Artificial, Bacterial/chemistry , Chromosomes, Artificial, Bacterial/genetics , Deoxyribonuclease HindIII/chemistry , Genetic Markers/genetics , Genome, Human , Genotype , Humans , Molecular Sequence Data , Pedigree , Sequence Alignment
18.
Genetics ; 176(1): 685-96, 2007 May.
Article in English | MEDLINE | ID: mdl-17339218

ABSTRACT

The first genetic transcript map of the soybean genome was created by mapping one SNP in each of 1141 genes in one or more of three recombinant inbred line mapping populations, thus providing a picture of the distribution of genic sequences across the mapped portion of the genome. Single-nucleotide polymorphisms (SNPs) were discovered via the resequencing of sequence-tagged sites (STSs) developed from expressed sequence tag (EST) sequence. From an initial set of 9459 polymerase chain reaction primer sets designed to a diverse set of genes, 4240 STSs were amplified and sequenced in each of six diverse soybean genotypes. In the resulting 2.44 Mbp of aligned sequence, a total of 5551 SNPs were discovered, including 4712 single-base changes and 839 indels for an average nucleotide diversity of Theta= 0.000997. The analysis of the observed genetic distances between adjacent genes vs. the theoretical distribution based upon the assumption of a random distribution of genes across the 20 soybean linkage groups clearly indicated that genes were clustered. Of the 1141 genes, 291 mapped to 72 of the 112 gaps of 5-10 cM in the preexisting simple sequence repeat (SSR)-based map, while 111 genes mapped in 19 of the 26 gaps >10 cM. The addition of 1141 sequence-based genic markers to the soybean genome map will provide an important resource to soybean geneticists for quantitative trait locus discovery and map-based cloning, as well as to soybean breeders who increasingly depend upon marker-assisted selection in cultivar improvement.


Subject(s)
Chromosome Mapping , Genes, Plant/genetics , Glycine max/genetics , Haplotypes/genetics , Polymorphism, Single Nucleotide/genetics , RNA, Plant/genetics , Transcription, Genetic/genetics , Base Sequence , DNA Primers , Databases, Nucleic Acid , Exons/genetics , Expressed Sequence Tags , Genetic Heterogeneity , Genetic Linkage , Introns/genetics , Minisatellite Repeats/genetics , Polymorphism, Restriction Fragment Length , RNA, Messenger/genetics , Sequence Tagged Sites
19.
Physiol Genomics ; 29(1): 35-43, 2007 Mar 14.
Article in English | MEDLINE | ID: mdl-17105755

ABSTRACT

MicroRNAs are small approximately 22 nucleotide-long noncoding RNAs capable of controlling gene expression by inhibiting translation. Alignment of human microRNA stem-loop sequences (mir) against a recent draft sequence assembly of the bovine genome resulted in identification of 334 predicted bovine mir. We sequenced five tissue-specific cDNA libraries derived from the small RNA fractions of bovine embryo, thymus, small intestine, and lymph node to validate these predictions and identify new mir. This strategy combined with comparative sequence analysis identified 129 sequences that corresponded to mature microRNAs (miR). A total of 107 sequences aligned to known human mir, and 100 of these matched expressed miR. The other seven sequences represented novel miR expressed from the complementary strand of previously characterized human mir. The 22 sequences without matches displayed characteristic mir secondary structures when folded in silico, and 10 of these retained sequence conservation with other vertebrate species. Expression analysis based on sequence identity counts revealed that some miR were preferentially expressed in certain tissues, while bta-miR-26a and bta-miR-103 were prevalent in all tissues examined. These results support the premise that species differences in regulation of gene expression by miR occur primarily at the level of expression and processing.


Subject(s)
Embryo, Mammalian/metabolism , Gene Expression Profiling , Gene Expression Regulation , MicroRNAs/genetics , MicroRNAs/metabolism , Animals , Base Pairing , Base Sequence , Cattle , Cluster Analysis , Computational Biology , Conserved Sequence/genetics , Gene Library , Genomics/methods , Molecular Sequence Data , Reverse Transcriptase Polymerase Chain Reaction , Sequence Alignment , Sequence Analysis, DNA
20.
BMC Bioinformatics ; 7: 468, 2006 Oct 23.
Article in English | MEDLINE | ID: mdl-17059604

ABSTRACT

BACKGROUND: Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable. RESULTS: We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis) and GenBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at http://bfgl.anri.barc.usda.gov/ML/snp-phage/. CONCLUSION: SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers.


Subject(s)
Chromosome Mapping/methods , DNA Mutational Analysis/methods , Polymorphism, Single Nucleotide/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , User-Computer Interface , Base Sequence , Molecular Sequence Data
SELECTION OF CITATIONS
SEARCH DETAIL
...