Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros

Base de dados
Tipo de documento
Assunto da revista
Intervalo de ano de publicação
1.
Adv Exp Med Biol ; 799: 39-67, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24292961

RESUMO

Recent technological advances in genomics now allow producing biological data at unprecedented tera- and petabyte scales. Yet, the extraction of useful knowledge from this voluminous data presents a significant challenge to a scientific community. Efficient mining of vast and complex data sets for the needs of biomedical research critically depends on seamless integration of clinical, genomic, and experimental information with prior knowledge about genotype-phenotype relationships accumulated in a plethora of publicly available databases. Furthermore, such experimental data should be accessible to a variety of algorithms and analytical pipelines that drive computational analysis and data mining. Translational projects require sophisticated approaches that coordinate and perform various analytical steps involved in the extraction of useful knowledge from accumulated clinical and experimental data in an orderly semiautomated manner. It presents a number of challenges such as (1) high-throughput data management involving data transfer, data storage, and access control; (2) scalable computational infrastructure; and (3) analysis of large-scale multidimensional data for the extraction of actionable knowledge.We present a scalable computational platform based on crosscutting requirements from multiple scientific groups for data integration, management, and analysis. The goal of this integrated platform is to address the challenges and to support the end-to-end analytical needs of various translational projects.


Assuntos
Pesquisa Translacional Biomédica/métodos , Pesquisa Translacional Biomédica/tendências , Mineração de Dados/métodos , Mineração de Dados/tendências , Bases de Dados Genéticas/tendências , Genômica/métodos , Genômica/tendências , Humanos
2.
Comput Biol Med ; 165: 107426, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37713789

RESUMO

The degree of dissimilarity between genome sequences of homologous species is a measure of the evolutionary distance between them. It serves as a metric in the construction of phylogenetic trees, which depict the evolutionary relationships and common ancestry among different species. Given two genome sequences, evolutionary distance is determined by estimating the number of global mutations that transform one sequence to the other. The computation of the evolutionary distance is done by modelling a genome with the corresponding permutation. Global rearrangement operations such as transposition that model a particular genomic mutation are studied by employing a combinatorial structure known as a cycle graph of the corresponding permutation. A cycle in a cycle graph that has odd length is called an odd cycle. In the context of the problem of sorting by transpositions (SBT), a valid 2-move is a transposition that increases the number of odd cycles in the cycle graph by two. A super oriented cycle (SOC) is an odd cycle C where C and one of the resultant cycles admit valid 2-moves. The minimum number of mutations required to transform a species S into a related species T is the distance from S to T under that mutation. Christie opined that characterizing SOCs will improve the lower bound of the transposition distance. We characterize super oriented cycles. Equivalent transformations on permutations like reduction and (g,b)-split preserve the transposition distance of a given permutation and map SBT to the corresponding SBT on a transformed simpler permutation. We introduce merge, a novel equivalent transformation. These results have applications in computing transposition and other distances between related species.


Assuntos
Hospitalização , Humanos , Filogenia , Mutação
3.
BMC Bioinformatics ; 13 Suppl 13: S3, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23320864

RESUMO

BACKGROUND: Numerous types of clustering like single linkage and K-means have been widely studied and applied to a variety of scientific problems. However, the existing methods are not readily applicable for the problems that demand high stringency. METHODS: Our method, self consistency grouping, i.e. SCG, yields clusters whose members are closer in rank to each other than to any member outside the cluster. We do not define a distance metric; we use the best known distance metric and presume that it measures the correct distance. SCG does not impose any restriction on the size or the number of the clusters that it finds. The boundaries of clusters are determined by the inconsistencies in the ranks. In addition to the direct implementation that finds the complete structure of the (sub)clusters we implemented two faster versions. The fastest version is guaranteed to find only the clusters that are not subclusters of any other clusters and the other version yields the same output as the direct implementation but does so more efficiently. RESULTS: Our tests have demonstrated that SCG yields very few false positives. This was accomplished by introducing errors in the distance measurement. Clustering of protein domain representatives by structural similarity showed that SCG could recover homologous groups with high precision. CONCLUSIONS: SCG has potential for finding biological relationships under stringent conditions.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Algoritmos , Estrutura Terciária de Proteína
4.
Nucleic Acids Res ; 37(Web Server issue): W526-31, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19420061

RESUMO

Assessing structural similarity and defining common regions through comparison of protein spatial structures is an important task in functional and evolutionary studies of proteins. There are many servers that compare structures and define sub-structures in common between proteins through superposition and closeness of either coordinates or contacts. However, a natural way to analyze a structure for experts working on structure classification is to look for specific three-dimensional (3D) motifs and patterns instead of finding common features in two proteins. Such motifs can be described by the architecture and topology of major secondary structural elements (SSEs) without consideration of subtle differences in 3D coordinates. Despite the importance of motif-based structure searches, currently there is a shortage of servers to perform this task. Widely known TOPS does not fully address this problem, as it finds only topological match but does not take into account other important spatial properties, such as interactions and chirality. Here, we implemented our approach to protein structure pattern search (ProSMoS) as a web-server. ProSMoS converts 3D structure into an interaction matrix representation including the SSE types, handednesses of connections between SSEs, coordinates of SSE starts and ends, types of interactions between SSEs and beta-sheet definitions. For a user-defined structure pattern, ProSMoS lists all structures from a database that contain this pattern. ProSMoS server will be of interest to structural biologists who would like to analyze very general and distant structural similarities. The ProSMoS web server is available at: http://prodata.swmed.edu/ProSMoS/.


Assuntos
Estrutura Secundária de Proteína , Software , Bases de Dados de Proteínas , Internet , Modelos Moleculares , Interface Usuário-Computador
5.
Adv Exp Med Biol ; 680: 725-36, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20865560

RESUMO

A k-bounded (k ≥ 2) transposition is an operation that switches two elements that have at most k - 2 elements in between. We study the problem of sorting a circular permutation π of length n for k = 2, i.e., adjacent swaps and k = 3, i.e., short swaps. These transpositions mimic microrearrangements of gene order in viruses and bacteria. We prove a (1/4)n (2) lower bound for sorting by adjacent swaps. We show upper bounds of (5/32)n (2) + O(n log n) and (7/8)n + O(log n) for sequential and parallel sorting, respectively, by short swaps.


Assuntos
Rearranjo Gênico , Modelos Genéticos , Algoritmos , Animais , Biologia Computacional , Simulação por Computador , Evolução Molecular , Ordem dos Genes , Humanos
6.
J Mol Biol ; 428(21): 4392-4412, 2016 10 23.
Artigo em Inglês | MEDLINE | ID: mdl-27498165

RESUMO

Globular proteins typically fold into tightly packed arrays of regular secondary structures. We developed a model to approximate the compact parallel and antiparallel arrangement of α-helices and ß-strands, enumerated all possible topologies formed by up to five secondary structural elements (SSEs), searched for their occurrence in spatial structures of proteins, and documented their frequencies of occurrence in the PDB. The enumeration model grows larger super-secondary structure patterns (SSPs) by combining pairs of smaller patterns, a process that approximates a potential path of protein fold evolution. The most prevalent SSPs are typically present in superfolds such as the Rossmann-like fold, the ferredoxin-like fold, and the Greek key motif, whereas the less frequent SSPs often possess uncommon structure features such as split ß-sheets, left-handed connections, and crossing loops. This complete SSP enumeration model, for the first time, allows us to investigate which theoretically possible SSPs are not observed in available protein structures. All SSPs with up to four SSEs occurred in proteins. However, among the SSPs with five SSEs, approximately 20% (218) are absent from existing folds. Of these unobserved SSPs, 80% contain two or more uncommon structure features. To facilitate future efforts in protein structure classification, engineering, and design, we provide the resulting patterns and their frequency of occurrence in proteins at: http://prodata.swmed.edu/ssps/.


Assuntos
Biologia Computacional/métodos , Dobramento de Proteína , Proteínas/química , Proteínas/metabolismo , Conformação Proteica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA