Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Bioinformatics ; 31(14): 2384-7, 2015 Jul 15.
Article in English | MEDLINE | ID: mdl-25792550

ABSTRACT

MOTIVATION: Tag density plots are very important to intuitively reveal biological phenomena from capture-based sequencing data by visualizing the normalized read depth in a region. RESULTS: We have developed iTagPlot to compute tag density across functional features in parallel using multicores and a grid engine and to interactively explore it in a graphical user interface. It allows us to stratify features by defining groups based on biological function and measurement, summary statistics and unsupervised clustering. AVAILABILITY AND IMPLEMENTATION: http://sourceforge.net/projects/itagplot/.


Subject(s)
Sequence Tagged Sites , Software , Cell Line , Cluster Analysis , Computer Graphics , CpG Islands , DNA Methylation , Gene Expression , High-Throughput Nucleotide Sequencing , Humans
2.
Nucleic Acids Res ; 37(Database issue): D698-702, 2009 Jan.
Article in English | MEDLINE | ID: mdl-18986995

ABSTRACT

Overlapping genes are defined as a pair of genes whose transcripts are overlapped. Recently, many cases of overlapped genes have been investigated in various eukaryotic organisms; however, their origin and transcriptional control mechanism has not yet been clearly determined. In this study, we implemented evolutionary visualizer for overlapping genes (EVOG), a Web-based DB with a novel visualization interface, to investigate the evolutionary relationship between overlapping genes. Using this technique, we collected and analyzed all overlapping genes in human, chimpanzee, orangutan, marmoset, rhesus, cow, dog, mouse, rat, chicken, Xenopus, zebrafish and Drosophila. This integrated database provides a manually curated database that displays the evolutionary features of overlapping genes. The EVOG DB components included a number of overlapping genes (10074 in human, 10,009 in chimpanzee, 67,039 in orangutan, 51,001 in marmoset, 219 in rhesus, 3627 in cow, 209 in dog, 10,700 in mouse, 7987 in rat, 1439 in chicken, 597 in Xenopus, 2457 in zebrafish and 4115 in Drosophila). The EVOG database is very effective and easy to use for the analysis of the evolutionary process of overlapping genes when comparing different species. Therefore, EVOG could potentially be used as the main tool to investigate the evolution of the human genome in relation to disease by comparing the expression profiles of overlapping genes. EVOG is available at http://neobio.cs.pusan.ac.kr/evog/.


Subject(s)
Databases, Genetic , Evolution, Molecular , Genes, Overlapping , Animals , Cattle , Dogs , Humans , Mice , Phylogeny , Rats , Transcription, Genetic , User-Computer Interface
3.
Biochem Biophys Res Commun ; 397(2): 340-4, 2010 Jun 25.
Article in English | MEDLINE | ID: mdl-20510878

ABSTRACT

Haplotype, which is the sequence of SNPs in a specific chromosome, plays an important role in disease association studies. However, current sequencing techniques can detect the presence of SNP sites, but they cannot tell which copy of a pair of chromosomes the alleles belong to. Moreover, sequencing errors that occurred in sequencing SNP fragments make it difficult to determine a pair of haplotypes from SNP fragments. To help overcome this difficulty, the haplotype assembly problem is defined from the viewpoint of computation, and several models are suggested to tackle this problem. However, there are no freely available web-based tools to overcome this problem as far as we are aware. In this paper, we present a web-based application based on the genetic algorithm, named HapAssembler, for assembling a pair of haplotypes from SNP fragments. Numerical results on real biological data show that the correct rate of the proposed application in this paper is greater than 95% in most cases. HapAssembler is freely available at http://alex.chonnam.ac.kr/~drminor/hapHome.htm. Users can choose any model among four models for their purpose and determine haplotypes from their input data.


Subject(s)
Algorithms , Haplotypes , Internet , Polymorphism, Single Nucleotide , Humans
4.
J Biomed Biotechnol ; 2008: 675741, 2008.
Article in English | MEDLINE | ID: mdl-18414585

ABSTRACT

We have developed a Windows-based program, ConPath, as a scaffold analyzer. ConPath constructs scaffolds by ordering and orienting separate sequence contigs by exploiting the mate-pair information between contig-pairs. Our algorithm builds directed graphs from link information and traverses them to find the longest acyclic graphs. Using end read pairs of fixed-sized mate-pair libraries, ConPath determines relative orientations of all contigs, estimates the gap size of each adjacent contig pair, and reports wrong assembly information by validating orientations and gap sizes. We have utilized ConPath in more than 10 microbial genome projects, including Mannheimia succiniciproducens and Vibro vulnificus, where we verified contig assembly and identified several erroneous contigs using the four types of error defined in ConPath. Also, ConPath supports some convenient features and viewers that permit investigation of each contig in detail; these include contig viewer, scaffold viewer, edge information list, mate-pair list, and the printing of complex scaffold structures.


Subject(s)
Algorithms , Contig Mapping/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , Base Pair Mismatch , Base Sequence , Molecular Sequence Data
5.
Nucleic Acids Res ; 30(1): 369-71, 2002 Jan 01.
Article in English | MEDLINE | ID: mdl-11752339

ABSTRACT

Angiogenesis is the formation of new capillaries sprouting from pre-existing vessels. Angiogenesis occurs in a variety of normal physiological and pathological conditions and is regulated by a balance of stimulatory and inhibitory angiogenic factors. The control of this balance may fail and result in the formation of a pathologic capillary network during the development of many diseases. Therefore, we developed the angiogenesis database (AngioDB), which can provide a signaling network of angiogenesis-related biomolecules in human. Each record of AngioDB consisted of 12 fields and was developed by using a relational database management system. For the retrieval of data, Active Server Page (ASP) technology was integrated in this system. Users can access the database by a query or imagemap browsing program. The retrieving system also provides a list of angiogenesis-related molecules classified by three categories, and the database has an external link to NCBI databases. AngioDB is available via the Internet at http://angiodb.snu.ac.kr/.


Subject(s)
Angiogenesis Inducing Agents/genetics , Angiogenesis Inhibitors/genetics , Databases, Protein , Amino Acid Sequence , Angiogenesis Inducing Agents/classification , Angiogenesis Inhibitors/classification , Base Sequence , Computer Graphics , Forecasting , Humans , Information Storage and Retrieval , Internet , Neovascularization, Pathologic , Neovascularization, Physiologic , Signal Transduction , Systems Integration
6.
Comput Biol Chem ; 29(3): 244-53, 2005 Jun.
Article in English | MEDLINE | ID: mdl-15979044

ABSTRACT

In this paper, we present a simple and efficient whole genome alignment method using maximal exact match (MEM). The major problem with the use of MEM anchor is that the number of hits in non-homologous regions increases exponentially when shorter MEM anchors are used to detect more homologous regions. To deal with this problem, we have developed a fast and accurate anchor filtering scheme based on simple match extension with minimum percent identity and extension length criteria. Due to its simplicity and accuracy, all MEM anchors in a pair of genomes can be exhaustively tested and filtered. In addition, by incorporating the translation technique, the alignment quality and speed of our genome alignment algorithm have been further improved. As a result, our genome alignment algorithm, GAME (Genome Alignment by Match Extension), performs competitively over existing algorithms and can align large whole genomes, e.g., A. thaliana, without the requirement of typical large memory and parallel processors. This is shown using an experiment which compares the performance of BLAST, BLASTZ, PatternHunter, MUMmer and our algorithm in aligning all 45 pairs of 10 microbial genomes. The scalability of our algorithm is shown in another experiment where all pairs of five chromosomes in A. thaliana were compared.


Subject(s)
Genome, Bacterial , Genome, Plant , Sequence Alignment/methods , Algorithms , Arabidopsis , Base Sequence , Chromosomes, Plant , Models, Genetic
7.
Gene ; 560(1): 83-8, 2015 Apr 10.
Article in English | MEDLINE | ID: mdl-25637569

ABSTRACT

With the advent of next-generation sequencing technology, genome-wide maps of DNA methylation are now available. The Thoroughbred horse is bred for racing, while the Jeju horse is a traditional Korean horse bred for racing or food. The methylation profiles of equine organs may provide genomic clues underlying their athletic traits. We have developed a database to elucidate genome-wide DNA methylation patterns of the cerebrum, lung, heart, and skeletal muscle from Thoroughbred and Jeju horses. Using MeDIP-Seq, our database provides information regarding significantly enriched methylated regions beyond a threshold, methylation density of a specific region, and differentially methylated regions (DMRs) for tissues from two equine breeds. It provided methylation patterns at 784 gene regions in the equine genome. This database can potentially help researchers identify DMRs in the tissues of these horse species and investigate the differences between the Thoroughbred and Jeju horse breeds.


Subject(s)
Databases, Genetic , Epigenesis, Genetic , Horses/genetics , Animals , Breeding , Chromosome Mapping/veterinary , CpG Islands , DNA Methylation , High-Throughput Nucleotide Sequencing/veterinary , Lung/metabolism , Muscle, Skeletal/metabolism , Myocardium/metabolism
8.
Genome Inform ; 13: 30-41, 2002.
Article in English | MEDLINE | ID: mdl-14571372

ABSTRACT

As sequenced genomes become larger and sequencing process becomes faster, there is a need to develop a tool to analyze sequences in the whole genomic scale. However, on-memory algorithms such as suffix tree and suffix array are not applicable to the analysis of whole genome sequence set, since the size of individual whole genome ranges from several million base pairs to hundreds billion base pairs. In order to effectively manipulate the huge sequence data, it is necessary to use the indexed data structure for external memory. In this paper, we introduce a workbench called SequeX for the analysis and visualization of whole genome sequences using SSB-tree (Static SB-tree). It consists of two parts: the analysis query subsystem and the visualization subsystem. The query subsystem supports various transactions such as pattern matching, k-occurrence, and k-mer analysis. The visualization subsystem helps biologists to easily understand whole genome structure and feature by sequence viewer, annotation viewer, CGR (Chaos Game Representation) viewer, and k-mer viewer. The system also supports a user-friendly programming interface based on Java script for batch processing and the extension for a specific purpose of a user. SequeX can be used to identify conserved genes or sequences by the analysis of the common k-mers and annotation. We analyze the common k-mer for 72 microbial genomes announced by Entrez, and find an interesting biological fact that the longest common k-mer for 72 sequences is 11-mer, and only 11 such sequences exist. Finally we note that many common k-mers occur in conserved region such as CDS, rRNA, and tRNA.


Subject(s)
Computational Biology/methods , Data Interpretation, Statistical , Sequence Analysis, DNA/methods , Software , Archaea/genetics , Bacteria/genetics , Genome , Oligonucleotides/genetics
9.
Genomics Inform ; 10(1): 58-64, 2012 Mar.
Article in English | MEDLINE | ID: mdl-23105930

ABSTRACT

Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and Arabidopsis) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in Ostreococcus lucimarinus was excessive (47.7%), while in Ostreococcus tauri, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.

10.
Bioinformatics ; 18 Suppl 2: S141-51, 2002.
Article in English | MEDLINE | ID: mdl-12385996

ABSTRACT

MOTIVATION: In this paper, we propose a fully automatic block and spot indexing algorithm for microarray image analysis. A microarray is a device which enables a parallel experiment of ten to hundreds of thousands of test genes in order to measure gene expression. Due to this huge size of experimental data, automated image analysis is gaining importance in microarray image processing systems. Currently, most of the automated microarray image processing systems require manual block indexing and, in some cases, spot indexing. If the microarray image is large and contains a lot of noise, it is very troublesome work. In this paper, we show it is possible to locate the addresses of blocks and spots by applying the Nearest Neighbors Graph Model. Also, we propose an analytic model for the feasibility of block addressing. Our analytic model is validated by a large body of experimental results. RESULTS: We demonstrate the features of automatic block detection, automatic spot addressing, and correction of the distortion and skewedness of each microarray image.


Subject(s)
Artificial Intelligence , DNA/analysis , Gene Expression Profiling/methods , Image Interpretation, Computer-Assisted/methods , Microscopy, Fluorescence/methods , Oligonucleotide Array Sequence Analysis/methods , Pattern Recognition, Automated/methods , Algorithms , DNA/genetics , Image Enhancement/methods
SELECTION OF CITATIONS
SEARCH DETAIL