Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
Add more filters










Publication year range
1.
PLoS One ; 19(3): e0298641, 2024.
Article in English | MEDLINE | ID: mdl-38478526

ABSTRACT

BACKGROUND: Genomic islands (GIs) are mobile genetic elements that integrate site-specifically into bacterial chromosomes, bearing genes that affect phenotypes such as pathogenicity and metabolism. GIs typically occur sporadically among related bacterial strains, enabling comparative genomic approaches to GI identification. For a candidate GI in a query genome, the number of reference genomes with a precise deletion of the GI serves as a support value for the GI. Our comparative software for GI identification was slowed by our original use of large reference genome databases (DBs). Here we explore smaller species-focused DBs. RESULTS: With increasing DB size, recovery of our reliable prophage GI calls reached a plateau, while recovery of less reliable GI calls (FPs) increased rapidly as DB sizes exceeded ~500 genomes; i.e., overlarge DBs can increase FP rates. Paradoxically, relative to prophages, FPs were both more frequently supported only by genomes outside the species and more frequently supported only by genomes inside the species; this may be due to their generally lower support values. Setting a DB size limit for our SMAll Ranked Tailored (SMART) DB design speeded runtime ~65-fold. Strictly intra-species DBs would tend to lower yields of prophages for small species (with few genomes available); simulations with large species showed that this could be partially overcome by reaching outside the species to closely related taxa, without an FP burden. Employing such taxonomic outreach in DB design generated redundancy in the DB set; as few as 2984 DBs were needed to cover all 47894 prokaryotic species. CONCLUSIONS: Runtime decreased dramatically with SMART DB design, with only minor losses of prophages. We also describe potential utility in other comparative genomics projects.


Subject(s)
Genome, Bacterial , Genomic Islands , Genomics , Bacteria/genetics , Prokaryotic Cells , Prophages/genetics
2.
Genes (Basel) ; 14(11)2023 Nov 15.
Article in English | MEDLINE | ID: mdl-38003025

ABSTRACT

Knowledge of circadian rhythm clock gene expression outside the suprachiasmatic nucleus is increasing. The purpose of this study was to determine whether expression of circadian clock genes differed within or among the bovine stress axis tissues (e.g., amygdala, hypothalamus, pituitary, adrenal cortex, and adrenal medulla). Tissues were obtained at an abattoir from eight mature nonpregnant Brahman cows that had been maintained in the same pasture and nutritional conditions. Sample tissues were stored in RNase-free sterile cryovials at -80 °C until the total RNA was extracted, quantified, assessed, and sequenced (NovaSeq 6000 system; paired-end 150 bp cycles). The trimmed reads were then mapped to a Bos taurus (B. taurus) reference genome (Umd3.1). Further analysis used the edgeR package. Raw gene count tables were read into RStudio, and low-expression genes were filtered out using the criteria of three minimum reads per gene in at least five samples. Normalization factors were then calculated using the trimmed mean of M values method to produce normalized gene counts within each sample tissue. The normalized gene counts important for a circadian rhythm were analyzed within and between each tissue of the stress axis using the GLM and CORR procedures of the Statistical Analysis System (SAS). The relative expression profiles of circadian clock genes differed (p < 0.01) within each tissue, with neuronal PAS domain protein 2 (NPAS2) having greater expression in the amygdala (p < 0.01) and period circadian regulator (PER1) having greater expression in all other tissues (p < 0.01). The expression among tissues also differed (p < 0.01) for individual circadian clock genes, with circadian locomotor output cycles protein kaput (CLOCK) expression being greater within the adrenal tissues and nuclear receptor subfamily 1 group D member 1 (NR1D1) expression being greater within the other tissues (p < 0.01). Overall, the results indicate that within each tissue, the various circadian clock genes were differentially expressed, in addition to being differentially expressed among the stress tissues of mature Brahman cows. Future use of these findings may assist in improving livestock husbandry and welfare by understanding interactions of the environment, stress responsiveness, and peripheral circadian rhythms.


Subject(s)
Circadian Clocks , Female , Cattle/genetics , Animals , Circadian Clocks/genetics , Period Circadian Proteins , Circadian Rhythm/genetics , Hypothalamus , Adrenal Glands
3.
Biology (Basel) ; 12(2)2023 Feb 05.
Article in English | MEDLINE | ID: mdl-36829529

ABSTRACT

Quantifying the natural inter-individual variation in DNA methylation patterns is important for identifying its contribution to phenotypic variation, but also for understanding how the environment affects variability, and for incorporation into statistical analyses. The inter-individual variation in DNA methylation patterns in female cattle and the effect that a prenatal stressor has on such variability have yet to be quantified. Thus, the objective of this study was to utilize methylation data from mature Brahman females to quantify the inter-individual variation in DNA methylation. Pregnant Brahman cows were transported for 2 h durations at days 60 ± 5; 80 ± 5; 100 ± 5; 120 ± 5; and 140 ± 5 of gestation. A non-transport group was maintained as a control. Leukocytes, amygdala, and anterior pituitary glands were harvested from eight cows born from the non-transport group (Control) and six from the transport group (PNS) at 5 years of age. The DNA harvested from the anterior pituitary contained the greatest variability in DNA methylation of cytosine-phosphate-guanine (mCpG) sites from both the PNS and Control groups, and the amygdala had the least. Numerous variable mCpG sites were associated with retrotransposable elements and highly repetitive regions of the genome. Some of the genomic features that had high variation in DNA methylation are involved in immune responses, signaling, responses to stimuli, and metabolic processes. The small overlap of highly variable CpG sites and features between tissues and leukocytes supports the role of variable DNA methylation in regulating tissue-specific gene expression. Many of the CpG sites that exhibited high variability in DNA methylation were common between the PNS and Control groups within a tissue, but there was little overlap in genomic features with high variability. The interaction between the prenatal environment and the genome could be responsible for the differences in location of the variable DNA methylation.

4.
Front Genet ; 13: 949309, 2022.
Article in English | MEDLINE | ID: mdl-35991551

ABSTRACT

Prenatal stress can alter postnatal performance and temperament of cattle. These phenotypic effects may result from changes in gene expression caused by stress-induced epigenetic alterations. Specifically, shifts in gene expression caused by DNA methylation within the brain's amygdala can result in altered behavior because it regulates fear, stress response and aggression in mammals Thus, the objective of this experiment was to identify DNA methylation and gene expression differences in the amygdala tissue of 5-year-old prenatally stressed (PNS) Brahman cows compared to control cows. Pregnant Brahman cows (n = 48) were transported for 2-h periods at 60 ± 5, 80 ± 5, 100 ± 5, 120 ± 5, and 140 ± 5 days of gestation. A non-transported group (n = 48) were controls (Control). Amygdala tissue was harvested from 6 PNS and 8 Control cows at 5 years of age. Overall methylation of gene body regions, promoter regions, and cytosine-phosphate-guanine (CpG) islands were compared between the two groups. In total, 202 genes, 134 promoter regions, and 133 CpG islands exhibited differential methylation (FDR ≤ 0.15). Following comparison of gene expression in the amygdala between the PNS and Control cows, 2 differentially expressed genes were identified (FDR ≤ 0.15). The minimal differences observed could be the result of natural changes of DNA methylation and gene expression as an animal ages, or because this degree of transportation stress was not severe enough to cause lasting effects on the offspring. A younger age may be a more appropriate time to assess methylation and gene expression differences produced by prenatal stress.

5.
Data Brief ; 35: 106852, 2021 Apr.
Article in English | MEDLINE | ID: mdl-33644273

ABSTRACT

Ticks from the genus Rhipicephalus have enormous global economic impact as ectoparasites of cattle. Rhipicephalus microplus and Rhipicephalus annulatus are known to harbor infectious pathogens such as Babesia bovis, Babesia bigemina, and Anaplasma marginale. Having reference quality genomes of these ticks would advance research to identify druggable targets for chemical entities with acaricidal activity and refine anti-tick vaccine approaches. We sequenced and assembled the genomes of R. microplus and R. annulatus, using Pacific Biosciences and HiSeq 4000 technologies on very high molecular weight genomic DNA. We used 22 and 29 SMRT cells on the Pacific Biosciences Sequel for R. microplus and R. annulatus, respectively, and 3 lanes of the Illumina HiSeq 4000 platform for each tick. The PacBio sequence yields for R. microplus and R. annulatus were 21.0 and 27.9 million subreads, respectively, which were assembled with Canu v. 1.7. The final Canu assemblies consisted of 92,167 and 57,796 contigs with an average contig length of 39,249 and 69,055 bp for R. microplus and R. annulatus, respectively. Annotated genome quality was assessed by BUSCO analysis to provide quantitative measures for each assembled genome. Over 82% and 92% of the 1066 member BUSCO gene set was found in the assembled genomes of R. microplus and R. annulatus, respectively. For R. microplus, only 189 of the 1066 BUSCO genes were missing and only 140 were present in a fragmented condition. For R. annulatus, only 75 of the BUSCO genes were missing and only 109 were present in a fragmented condition. The raw sequencing reads and the assembled contigs/scaffolds are archived at the National Center for Biotechnology Information.

7.
Nat Biotechnol ; 38(11): 1347-1355, 2020 11.
Article in English | MEDLINE | ID: mdl-32541955

ABSTRACT

New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.


Subject(s)
Germ-Line Mutation/genetics , INDEL Mutation/genetics , Diploidy , Genomic Structural Variation , Humans , Molecular Sequence Annotation , Sequence Analysis, DNA
8.
Andrology ; 8(5): 1409-1418, 2020 09.
Article in English | MEDLINE | ID: mdl-32243084

ABSTRACT

BACKGROUND: It is not unusual for stallions to have fertility problems. For many, artificial insemination with more dense spermatozoa (isolated by density gradient centrifugation) results in greater pregnancy rates compared with the rates when using unfractionated spermatozoa. RNAs in spermatozoa delivered to the oocyte at conception are required for embryo development. Novel molecular assays of spermatozoa that reflect function are needed to predict the fertility of stallions. OBJECTIVES: To describe and compare the RNA populations in more dense and less dense spermatozoa from stallions. MATERIALS AND METHODS: Spermatozoa from five stallions were separated into more dense and less dense populations by density gradient centrifugation. Complementary DNA libraries were made from each of the ten total RNA samples after ribosomal RNA removal. Next-generation sequencing characterized the RNA populations in more and less dense spermatozoa. Quantitative reverse transcription-PCR was used to confirm differential expression of selected RNAs. RESULTS: Stallion spermatozoa contain 11 215 RNAs, with the most prevalent RNA being a 1492 base long non-coding RNA. The levels of 159 RNAs were greater in more dense spermatozoa, while levels of seven other RNAs were greater in less dense spermatozoa. Quantitative reverse transcription-PCR confirmed the threefold greater levels of solute carrier family 26 member 8 (SLC26A8) mRNA in less dense spermatozoa, and sixfold and threefold greater expression levels of the SCP2 sterol binding domain containing 1 (SCP2D1) and spermatogenesis-associated protein 31D1 (SPATA31D1) mRNAs in more dense spermatozoa, respectively. DISCUSSION AND CONCLUSION: We identified 11 215 RNAs in stallion spermatozoa and 166 with differential expression between more dense and less dense fractions. Many prevalent RNAs were also found in bull, boar, and human spermatozoa. Many differentially expressed RNAs are known to be testis- or spermatozoa-specific. Our results may lead to identification of an RNA population in spermatozoa that is optimal for establishing successful pregnancies.


Subject(s)
Fertility/genetics , RNA, Long Noncoding , RNA, Messenger , Spermatozoa/metabolism , Animals , Cattle , Centrifugation, Density Gradient , Horses , Humans , Male , RNA, Long Noncoding/analysis , RNA, Long Noncoding/metabolism , RNA, Messenger/analysis , RNA, Messenger/metabolism , Swine
9.
Data Brief ; 27: 104602, 2019 Dec.
Article in English | MEDLINE | ID: mdl-31656838

ABSTRACT

The longhorned tick, Haemaphysalis longicornis, feeds upon a wide range of bird and mammalian hosts. Mammalian hosts include cattle, deer, sheep, goats, humans, and horses. This tick is known to transmit a number of pathogens causing tick-borne diseases, and was the vector of a recent serious outbreak of oriental theileriosis in New Zealand. A New Zealand-USA consortium was established to sequence, assemble, and annotate the genome of this tick, using ticks obtained from New Zealand's North Island. In New Zealand, the tick is considered exclusively parthenogenetic and this trait was deemed useful for genome assembly. Very high molecular weight genomic DNA was sequenced on the Illumina HiSeq4000 and the long-read Pac Bio Sequel platforms. Twenty-eight SMRT cells produced a total of 21.3 million reads which were assembled with Canu on a reserved supercomputer node with access to 12 TB of RAM, running continuously for over 24 days. The final assembly dataset consisted of 34,211 contigs with an average contig length of 215,205 bp. The quality of the annotated genome was assessed by BUSCO analysis, an approach that provides quantitative measures for the quality of an assembled genome. Over 95% of the BUSCO gene set was found in the assembled genome. Only 48 of the 1066 BUSCO genes were missing and only 9 were present in a fragmented condition. The raw sequencing reads and the assembled contigs/scaffolds are archived at the National Center for Biotechnology Information.

10.
Genes Dev ; 33(5-6): 294-309, 2019 03 01.
Article in English | MEDLINE | ID: mdl-30804225

ABSTRACT

The mammalian circadian clock relies on the transcription factor CLOCK:BMAL1 to coordinate the rhythmic expression of thousands of genes. Consistent with the various biological functions under clock control, rhythmic gene expression is tissue-specific despite an identical clockwork mechanism in every cell. Here we show that BMAL1 DNA binding is largely tissue-specific, likely because of differences in chromatin accessibility between tissues and cobinding of tissue-specific transcription factors. Our results also indicate that BMAL1 ability to drive tissue-specific rhythmic transcription is associated with not only the activity of BMAL1-bound enhancers but also the activity of neighboring enhancers. Characterization of physical interactions between BMAL1 enhancers and other cis-regulatory regions by RNA polymerase II chromatin interaction analysis by paired-end tag (ChIA-PET) reveals that rhythmic BMAL1 target gene expression correlates with rhythmic chromatin interactions. These data thus support that much of BMAL1 target gene transcription depends on BMAL1 capacity to rhythmically regulate a network of enhancers.


Subject(s)
ARNTL Transcription Factors/genetics , ARNTL Transcription Factors/metabolism , Gene Expression Regulation/genetics , Amino Acid Motifs/genetics , Animals , Chromatin/metabolism , Circadian Rhythm/genetics , Enhancer Elements, Genetic/genetics , Male , Mice , Mice, Inbred C57BL , Organ Specificity , Promoter Regions, Genetic/genetics , Protein Binding , RNA Polymerase II/metabolism
11.
Article in English | MEDLINE | ID: mdl-29610098

ABSTRACT

New de novo transcriptome assembly and annotation methods provide an incredible opportunity to study the transcriptome of organisms that lack an assembled and annotated genome. There are currently a number of de novo transcriptome assembly methods, but it has been difficult to evaluate the quality of these assemblies. In order to assess the quality of the transcriptome assemblies, we composed a workflow of multiple quality check measurements that in combination provide a clear evaluation of the assembly performance. We presented novel transcriptome assemblies and functional annotations for Pacific Whiteleg Shrimp (Litopenaeus vannamei ), a mariculture species with great national and international interest, and no solid transcriptome/genome reference. We examined Pacific Whiteleg transcriptome assemblies via multiple metrics, and provide an improved gene annotation. Our investigations show that assessing the quality of an assembly purely based on the assembler's statistical measurements can be misleading; we propose a hybrid approach that consists of statistical quality checks and further biological-based evaluations.


Subject(s)
Computational Biology/methods , Exome Sequencing/methods , Transcriptome/genetics , Algorithms , Animals , Penaeidae/genetics
12.
BMC Genomics ; 18(1): 694, 2017 Sep 05.
Article in English | MEDLINE | ID: mdl-28874136

ABSTRACT

BACKGROUND: The information content of genomes plays a crucial role in the existence and proper development of living organisms. Thus, tremendous effort has been dedicated to developing DNA sequencing technologies that provide a better understanding of the underlying mechanisms of cellular processes. Advances in the development of sequencing technology have made it possible to sequence genomes in a relatively fast and inexpensive way. However, as with any measurement technology, there is noise involved and this needs to be addressed to reach conclusions based on the resulting data. In addition, there are multiple intermediate steps and degrees of freedom when constructing genome assemblies that lead to ambiguous and inconsistent results among assemblers. METHODS: Here we introduce HiMMe, an HMM-based tool that relies on genetic patterns to score genome assemblies. Through a Markov chain, the model is able to detect characteristic genetic patterns, while, by introducing emission probabilities, the noise involved in the process is taken into account. Prior knowledge can be used by training the model to fit a given organism or sequencing technology. RESULTS: Our results show that the method presented is able to recognize patterns even with relatively small k-mer size choices and limited computational resources. CONCLUSIONS: Our methodology provides an individual quality metric per contig in addition to an overall genome assembly score, with a time complexity well below that of an aligner. Ultimately, HiMMe provides meaningful statistical insights that can be leveraged by researchers to better select contigs and genome assemblies for downstream analysis.


Subject(s)
Genomics/methods , Markov Chains , Algorithms , Bayes Theorem , Reproducibility of Results
13.
Comput Biol Chem ; 69: 153-163, 2017 Aug.
Article in English | MEDLINE | ID: mdl-28528020

ABSTRACT

Recent advances in high-throughput genome sequencing technologies have enabled the systematic study of various genomes by making whole genome sequencing affordable. Modern sequencers generate a huge number of small sequence fragments called reads, where the read length and the per-base sequencing cost depend on the technology used. To date, many hybrid genome assembly algorithms have been developed that can take reads from multiple read sources to reconstruct the original genome. However, rigorous investigation of the feasibility conditions for complete genome reconstruction and the optimal sequencing strategy for minimizing the sequencing cost has been conspicuously missing. An important aspect of hybrid sequencing and assembly is that the feasibility conditions for genome reconstruction can be satisfied by different combinations of the available read sources, opening up the possibility of optimally combining the sources to minimize the sequencing cost while ensuring accurate genome reconstruction. In this paper, we derive the conditions for whole genome reconstruction from multiple read sources at a given confidence level and also introduce the optimal strategy for combining reads from different sources to minimize the overall sequencing cost. We show that the optimal read set, which simultaneously satisfies the feasibility conditions for genome reconstruction and minimizes the sequencing cost, can be effectively predicted through constrained discrete optimization. Through extensive evaluations based on several genomes and different read sets, we verify the derived feasibility conditions and demonstrate the performance of the proposed optimal hybrid sequencing and assembly strategy.


Subject(s)
Algorithms , Genome, Bacterial/genetics , High-Throughput Nucleotide Sequencing/economics , Rhodobacter sphaeroides/genetics , Staphylococcus aureus/genetics , Sulfolobus/genetics
14.
BMC Genomics ; 17: 202, 2016 Mar 08.
Article in English | MEDLINE | ID: mdl-26956617

ABSTRACT

BACKGROUND: Colletotrichum graminicola is a hemibiotrophic fungal pathogen that causes maize anthracnose disease. It progresses through three recognizable phases of pathogenic development in planta: melanized appressoria on the host surface prior to penetration; biotrophy, characterized by intracellular colonization of living host cells; and necrotrophy, characterized by host cell death and symptom development. A "Mixed Effects" Generalized Linear Model (GLM) was developed and applied to an existing Illumina transcriptome dataset, substantially increasing the statistical power of the analysis of C. graminicola gene expression during infection and colonization. Additionally, the in planta transcriptome of the wild-type was compared with that of a mutant strain impaired in the establishment of biotrophy, allowing detailed dissection of events occurring specifically during penetration, and during early versus late biotrophy. RESULTS: More than 2000 fungal genes were differentially transcribed during appressorial maturation, penetration, and colonization. Secreted proteins, secondary metabolism genes, and membrane receptors were over-represented among the differentially expressed genes, suggesting that the fungus engages in an intimate and dynamic conversation with the host, beginning prior to penetration. This communication process probably involves reception of plant signals triggering subsequent developmental progress in the fungus, as well as production of signals that induce responses in the host. Later phases of biotrophy were more similar to necrotrophy, with increased production of secreted proteases, inducers of plant cell death, hydrolases, and membrane bound transporters for the uptake and egress of potential toxins, signals, and nutrients. CONCLUSIONS: This approach revealed, in unprecedented detail, fungal genes specifically expressed during critical phases of host penetration and biotrophic establishment. Many encoded secreted proteins, secondary metabolism enzymes, and receptors that may play roles in host-pathogen communication necessary to promote susceptibility, and thus may provide targets for chemical or biological controls to manage this important disease. The differentially expressed genes could be used as 'landmarks' to more accurately identify developmental progress in compatible versus incompatible interactions involving genetic variants of both host and pathogen.


Subject(s)
Colletotrichum/genetics , Plant Diseases/microbiology , Transcriptome , Colletotrichum/pathogenicity , Gene Expression Regulation, Fungal , Genes, Fungal , Host-Pathogen Interactions , Linear Models , RNA, Fungal/genetics , Secondary Metabolism , Sequence Analysis, RNA , Zea mays/microbiology
15.
Sci Rep ; 4: 7081, 2014 Nov 25.
Article in English | MEDLINE | ID: mdl-25420880

ABSTRACT

We present a new transcriptome assembly of the Pacific whiteleg shrimp (Litopenaeus vannamei), the species most farmed for human consumption. Its functional annotation, a substantial improvement over previous ones, is provided freely. RNA-Seq with Illumina HiSeq technology was used to analyze samples extracted from shrimp abdominal muscle, hepatopancreas, gills and pleopods. We used the Trinity and Trinotate software suites for transcriptome assembly and annotation, respectively. The quality of this assembly and the affiliated targeted homology searches greatly enrich the curated transcripts currently available in public databases for this species. Comparison with the model arthropod Daphnia allows some insights into defining characteristics of decapod crustaceans. This large-scale gene discovery gives the broadest depth yet to the annotated transcriptome of this important species and should be of value to ongoing genomics and immunogenetic resistance studies in this shrimp of paramount global economic importance.


Subject(s)
Aquaculture , Penaeidae/genetics , Penaeidae/metabolism , Seafood , Transcriptome , Algorithms , Animals , Crustacea/genetics , Crustacea/metabolism , DNA Replication/genetics , Daphnia/metabolism , Databases, Genetic , Genomics , Immune System/metabolism , Sequence Analysis, RNA , User-Computer Interface
16.
BMC Bioinformatics ; 14: 307, 2013 Oct 11.
Article in English | MEDLINE | ID: mdl-24118904

ABSTRACT

BACKGROUND: A key goal of systems biology and translational genomics is to utilize high-throughput measurements of cellular states to develop expression-based classifiers for discriminating among different phenotypes. Recent developments of Next Generation Sequencing (NGS) technologies can facilitate classifier design by providing expression measurements for tens of thousands of genes simultaneously via the abundance of their mRNA transcripts. Because NGS technologies result in a nonlinear transformation of the actual expression distributions, their application can result in data that are less discriminative than would be the actual expression levels themselves, were they directly observable. RESULTS: Using state-of-the-art distributional modeling for the NGS processing pipeline, this paper studies how that pipeline, via the resulting nonlinear transformation, affects classification and feature selection. The effects of different factors are considered and NGS-based classification is compared to SAGE-based classification and classification directly on the raw expression data, which is represented by a very high-dimensional model previously developed for gene expression. As expected, the nonlinear transformation resulting from NGS processing diminishes classification accuracy; however, owing to a larger number of reads, NGS-based classification outperforms SAGE-based classification. CONCLUSIONS: Having high numbers of reads can mitigate the degradation in classification performance resulting from the effects of NGS technologies. Hence, when performing a RNA-Seq analysis, using the highest possible coverage of the genome is recommended for the purposes of classification.


Subject(s)
Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Systems Biology/methods , Gene Expression Profiling , Genome/genetics , Models, Genetic , RNA, Messenger/analysis , RNA, Messenger/genetics , RNA, Messenger/metabolism
17.
BMC Genomics ; 13: 78, 2012 Feb 17.
Article in English | MEDLINE | ID: mdl-22340285

ABSTRACT

BACKGROUND: The catalog of genetic variants in the horse genome originates from a few select animals, the majority originating from the Thoroughbred mare used for the equine genome sequencing project. The purpose of this study was to identify genetic variants, including single nucleotide polymorphisms (SNPs), insertion/deletion polymorphisms (INDELs), and copy number variants (CNVs) in the genome of an individual Quarter Horse mare sequenced by next-generation sequencing. RESULTS: Using massively parallel paired-end sequencing, we generated 59.6 Gb of DNA sequence from a Quarter Horse mare resulting in an average of 24.7X sequence coverage. Reads were mapped to approximately 97% of the reference Thoroughbred genome. Unmapped reads were de novo assembled resulting in 19.1 Mb of new genomic sequence in the horse. Using a stringent filtering method, we identified 3.1 million SNPs, 193 thousand INDELs, and 282 CNVs. Genetic variants were annotated to determine their impact on gene structure and function. Additionally, we genotyped this Quarter Horse for mutations of known diseases and for variants associated with particular traits. Functional clustering analysis of genetic variants revealed that most of the genetic variation in the horse's genome was enriched in sensory perception, signal transduction, and immunity and defense pathways. CONCLUSIONS: This is the first sequencing of a horse genome by next-generation sequencing and the first genomic sequence of an individual Quarter Horse mare. We have increased the catalog of genetic variants for use in equine genomics by the addition of novel SNPs, INDELs, and CNVs. The genetic variants described here will be a useful resource for future studies of genetic variation regulating performance traits and diseases in equids.


Subject(s)
Genome , Horses/genetics , Polymorphism, Single Nucleotide , Sequence Analysis, DNA , Animals , DNA Copy Number Variations , Female , Genomics/methods , Genotype , Horse Diseases/genetics , Molecular Sequence Annotation , Mutation , Quantitative Trait Loci , Signal Transduction
18.
BMC Bioinformatics ; 12 Suppl 10: S5, 2011 Oct 18.
Article in English | MEDLINE | ID: mdl-22165852

ABSTRACT

BACKGROUND: RNA-Seq is the recently developed high-throughput sequencing technology for profiling the entire transcriptome in any organism. It has several major advantages over current hybridization-based approach such as microarrays. However, the cost per sample by RNA-Seq is still prohibitive for most laboratories. With continued improvement in sequence output, it would be cost-effective if multiple samples are multiplexed and sequenced in a single lane with sufficient transcriptome coverage. The objective of this analysis is to evaluate what sequencing depth might be sufficient to interrogate gene expression profiling in the chicken by RNA-Seq. RESULTS: Two cDNA libraries from chicken lungs were sequenced initially, and 4.9 million (M) and 1.6 M (60 bp) reads were generated, respectively. With significant improvements in sequencing technology, two technical replicate cDNA libraries were re-sequenced. Totals of 29.6 M and 28.7 M (75 bp) reads were obtained with the two samples. More than 90% of annotated genes were detected in the data sets with 28.7-29.6 M reads, while only 68% of genes were detected in the data set with 1.6 M reads. The correlation coefficients of gene expression between technical replicates within the same sample were 0.9458 and 0.8442. To evaluate the appropriate depth needed for mRNA profiling, a random sampling method was used to generate different number of reads from each sample. There was a significant increase in correlation coefficients from a sequencing depth of 1.6 M to 10 M for all genes except highly abundant genes. No significant improvement was observed from the depth of 10 M to 20 M (75 bp) reads. CONCLUSION: The analysis from the current study demonstrated that 30 M (75 bp) reads is sufficient to detect all annotated genes in chicken lungs. Ten million (75 bp) reads could detect about 80% of annotated chicken genes, and RNA-Seq at this depth can serve as a replacement of microarray technology. Furthermore, the depth of sequencing had a significant impact on measuring gene expression of low abundant genes. Finally, the combination of experimental and simulation approaches is a powerful approach to address the relationship between the depth of sequencing and transcriptome coverage.


Subject(s)
Chickens/genetics , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Lung/metabolism , Sequence Analysis, RNA , Animals , Gene Library , Molecular Sequence Annotation , RNA, Messenger/genetics
19.
BMC Bioinformatics ; 12 Suppl 10: S10, 2011 Oct 18.
Article in English | MEDLINE | ID: mdl-22165980

ABSTRACT

BACKGROUND: One of the most important goals of the mathematical modeling of gene regulatory networks is to alter their behavior toward desirable phenotypes. Therapeutic techniques are derived for intervention in terms of stationary control policies. In large networks, it becomes computationally burdensome to derive an optimal control policy. To overcome this problem, greedy intervention approaches based on the concept of the Mean First Passage Time or the steady-state probability mass of the network states were previously proposed. Another possible approach is to use reduction mappings to compress the network and develop control policies on its reduced version. However, such mappings lead to loss of information and require an induction step when designing the control policy for the original network. RESULTS: In this paper, we propose a novel solution, CoD-CP, for designing intervention policies for large Boolean networks. The new method utilizes the Coefficient of Determination (CoD) and the Steady-State Distribution (SSD) of the model. The main advantage of CoD-CP in comparison with the previously proposed methods is that it does not require any compression of the original model, and thus can be directly designed on large networks. The simulation studies on small synthetic networks shows that CoD-CP performs comparable to previously proposed greedy policies that were induced from the compressed versions of the networks. Furthermore, on a large 17-gene gastrointestinal cancer network, CoD-CP outperforms other two available greedy techniques, which is precisely the kind of case for which CoD-CP has been developed. Finally, our experiments show that CoD-CP is robust with respect to the attractor structure of the model. CONCLUSIONS: The newly proposed CoD-CP provides an attractive alternative for intervening large networks where other available greedy methods require size reduction on the network and an extra induction step before designing a control policy.


Subject(s)
Gene Regulatory Networks , Models, Genetic , Computer Simulation , Gastrointestinal Neoplasms/genetics , Humans , Probability
20.
Bioinformatics ; 26(24): 3098-104, 2010 Dec 15.
Article in English | MEDLINE | ID: mdl-20956246

ABSTRACT

MOTIVATION: A key goal of studying biological systems is to design therapeutic intervention strategies. Probabilistic Boolean networks (PBNs) constitute a mathematical model which enables modeling, predicting and intervening in their long-run behavior using Markov chain theory. The long-run dynamics of a PBN, as represented by its steady-state distribution (SSD), can guide the design of effective intervention strategies for the modeled systems. A major obstacle for its application is the large state space of the underlying Markov chain, which poses a serious computational challenge. Hence, it is critical to reduce the model complexity of PBNs for practical applications. RESULTS: We propose a strategy to reduce the state space of the underlying Markov chain of a PBN based on a criterion that the reduction least distorts the proportional change of stationary masses for critical states, for instance, the network attractors. In comparison to previous reduction methods, we reduce the state space directly, without deleting genes. We then derive stationary control policies on the reduced network that can be naturally induced back to the original network. Computational experiments study the effects of the reduction on model complexity and the performance of designed control policies which is measured by the shift of stationary mass away from undesirable states, those associated with undesirable phenotypes. We consider randomly generated networks as well as a 17-gene gastrointestinal cancer network, which, if not reduced, has a 2(17) × 2(17) transition probability matrix. Such a dimension is too large for direct application of many previously proposed PBN intervention strategies.


Subject(s)
Models, Statistical , Algorithms , Gastrointestinal Neoplasms/genetics , Gene Regulatory Networks , Humans , Markov Chains , Models, Biological , Models, Genetic , Probability
SELECTION OF CITATIONS
SEARCH DETAIL
...