Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets.

Hughes, Adam; Ruan, Yang; Ekanayake, Saliya; Bae, Seung-Hee; Dong, Qunfeng; Rho, Mina; Qiu, Judy; Fox, Geoffrey.

BMC Bioinformatics ; 13 Suppl 2: S9, 2012 Mar 13.

Artigo em Inglês | MEDLINE | ID: mdl-22536872

RESUMO

BACKGROUND: Modern pyrosequencing techniques make it possible to study complex bacterial populations, such as 16S rRNA, directly from environmental or clinical samples without the need for laboratory purification. Alignment of sequences across the resultant large data sets (100,000+ sequences) is of particular interest for the purpose of identifying potential gene clusters and families, but such analysis represents a daunting computational task. The aim of this work is the development of an efficient pipeline for the clustering of large sequence read sets. METHODS: Pairwise alignment techniques are used here to calculate genetic distances between sequence pairs. These methods are pleasingly parallel and have been shown to more accurately reflect accurate genetic distances in highly variable regions of rRNA genes than do traditional multiple sequence alignment (MSA) approaches. By utilizing Needleman-Wunsch (NW) pairwise alignment in conjunction with novel implementations of interpolative multidimensional scaling (MDS), we have developed an effective method for visualizing massive biosequence data sets and quickly identifying potential gene clusters. RESULTS: This study demonstrates the use of interpolative MDS to obtain clustering results that are qualitatively similar to those obtained through full MDS, but with substantial cost savings. In particular, the wall clock time required to cluster a set of 100,000 sequences has been reduced from seven hours to less than one hour through the use of interpolative MDS. CONCLUSIONS: Although work remains to be done in selecting the optimal training set size for interpolative MDS, substantial computational cost savings will allow us to cluster much larger sequence sets in the future.

Assuntos

Metagenômica/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Análise por Conglomerados , RNA Ribossômico 16S/genética , Alinhamento de Sequência

Hybrid cloud and cluster computing paradigms for life science applications.

Qiu, Judy; Ekanayake, Jaliya; Gunarathne, Thilina; Choi, Jong Youl; Bae, Seung-Hee; Li, Hui; Zhang, Bingjing; Wu, Tak-Lon; Ruan, Yang; Ekanayake, Saliya; Hughes, Adam; Fox, Geoffrey.

BMC Bioinformatics ; 11 Suppl 12: S3, 2010 Dec 21.

Artigo em Inglês | MEDLINE | ID: mdl-21210982

RESUMO

BACKGROUND: Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. RESULTS: Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. CONCLUSIONS: The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. METHODS: We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.

Assuntos

Biologia Computacional/métodos , Software , Disciplinas das Ciências Biológicas , Análise por Conglomerados , Mineração de Dados , Metagenômica

dPattern: transcription factor binding site (TFBS) discovery in human genome using a discriminative pattern analysis.

Bae, Seung-Hee; Tang, Haixu; Wu, Jing; Xie, Jun; Kim, Sun.

Bioinformatics ; 23(19): 2619-21, 2007 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-17550915

RESUMO

MOTIVATION: Transcription factor binding sites (TFBSs) are typically short in length, thus search with a profile model from known TFBSs produces many false positives. When combined with additional information, gene expression data in this article, sensitivity and specificity of TFBS search can be improved significantly. RESULTS: By modifying our previous REFINEMENT approach, we developed dPattern that searches for occurrences of TFBSs in the promotor regions of up/down regulated or random genes.

Assuntos

Mapeamento Cromossômico/métodos , Perfilação da Expressão Gênica/métodos , Genoma Humano/genética , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência de DNA/métodos , Fatores de Transcrição/química , Fatores de Transcrição/genética , Algoritmos , Inteligência Artificial , Sequência de Bases , Sítios de Ligação , Análise Discriminante , Humanos , Dados de Sequência Molecular , Ligação Proteica , Software

Crystallization and preliminary X-ray crystallographic analysis of acetohydroxy acid isomeroreductase from Pseudomonas aeruginosa.

Eom, Su Jeong; Ahn, Hyung Jun; Yoon, Hye Jin; Lee, Byung Il; Bae, Seung Hee; Baek, Seung Hun; Suh, Se Won.

Acta Crystallogr D Biol Crystallogr ; 58(Pt 12): 2145-6, 2002 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-12454481

RESUMO

Acetohydroxy acid isomeroreductase (AHIR) is involved in the biosynthetic pathway of branched-chain amino acids in microorganisms and plants. AHIR from Pseudomonas aeruginosa has been overexpressed in Escherichia coli and crystallized at 297 K using potassium/sodium tartrate as a precipitant. X-ray diffraction data have been collected to 2.0 A resolution at 100 K using synchrotron radiation. The crystals belong to the cubic space group P2(1)3, with unit-cell parameters a = b = c = 184.38 A, alpha = beta = gamma = 90 degrees. Six monomers are present in the asymmetric unit, giving a V(M) of 2.34 A(3) Da(-1) and a solvent content of 47.4%.

Assuntos

Oxirredutases do Álcool/química , Pseudomonas aeruginosa/enzimologia , Cristalização , Cristalografia por Raios X , Cetol-Ácido Redutoisomerase , Luz , Conformação Proteica , Proteínas Recombinantes/química , Espalhamento de Radiação

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA