Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Más filtros

Banco de datos
Tipo del documento
Publication year range
1.
PLoS One ; 12(8): e0181930, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28763475

RESUMEN

Numerical approaches to high-density single nucleotide polymorphism (SNP) data are often employed independently to address individual questions. We linked independent approaches in a bioinformatics pipeline for further insight. The pipeline driven by heterozygosity and Hardy-Weinberg equilibrium (HWE) analyses was applied to characterize Bos taurus and Bos indicus ancestry. We infer a gene co-heterozygosity network that regulates bovine fertility, from data on 18,363 cattle with genotypes for 729,068 SNP. Hierarchical clustering separated populations according to Bos taurus and Bos indicus ancestry. The weights of the first principal component were subjected to Normal mixture modelling allowing the estimation of a gene's contribution to the Bos taurus-Bos indicus axis. We used deviation from HWE, contribution to Bos indicus content and association to fertility traits to select 1,284 genes. With this set, we developed a co-heterozygosity network where the group of genes annotated as fertility-related had significantly higher Bos indicus content compared to other functional classes of genes, while the group of genes associated with milk production had significantly higher Bos taurus content. The network analysis resulted in capturing novel gene associations of relevance to bovine domestication events. We report transcription factors that are likely to regulate genes associated with cattle domestication and tropical adaptation. Our pipeline can be generalized to any scenarios where population structure requires scrutiny at the molecular level, particularly in the presence of a priori set of genes known to impact a phenotype of evolutionary interest such as fertility.


Asunto(s)
Bovinos/genética , Fertilidad/genética , Leche , Polimorfismo de Nucleótido Simple , Algoritmos , Animales , Cruzamiento , Biología Computacional , Frecuencia de los Genes , Variación Genética , Genotipo , Heterocigoto , Fenotipo , Filogenia , Análisis de Componente Principal , Especificidad de la Especie
2.
J Comput Biol ; 22(6): 487-97, 2015 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-25695500

RESUMEN

The problem of superposition of two corresponding vector sets by minimizing their sum-of-squares error under orthogonal transformation is a fundamental task in many areas of science, notably structural molecular biology. This problem can be solved exactly using an algorithm whose time complexity grows linearly with the number of correspondences. This efficient solution has facilitated the widespread use of the superposition task, particularly in studies involving macromolecular structures. This article formally derives a set of sufficient statistics for the least-squares superposition problem. These statistics are additive. This permits a highly efficient (constant time) computation of superpositions (and sufficient statistics) of vector sets that are composed from its constituent vector sets under addition or deletion operation, where the sufficient statistics of the constituent sets are already known (that is, the constituent vector sets have been previously superposed). This results in a drastic improvement in the run time of the methods that commonly superpose vector sets under addition or deletion operations, where previously these operations were carried out ab initio (ignoring the sufficient statistics). We experimentally demonstrate the improvement our work offers in the context of protein structural alignment programs that assemble a reliable structural alignment from well-fitting (substructural) fragment pairs. A C++ library for this task is available online under an open-source license.


Asunto(s)
Proteínas/química , Algoritmos , Análisis de los Mínimos Cuadrados , Modelos Moleculares , Conformación Proteica , Alineación de Secuencia/métodos
3.
Artículo en Inglés | MEDLINE | ID: mdl-26357057

RESUMEN

Proteins fold into complex three-dimensional shapes. Simplified representations of their shapes are central to rationalise, compare, classify, and interpret protein structures. Traditional methods to abstract protein folding patterns rely on representing their standard secondary structural elements (helices and strands of sheet) using line segments. This results in ignoring a significant proportion of structural information. The motivation of this research is to derive mathematically rigorous and biologically meaningful abstractions of protein folding patterns that maximize the economy of structural description and minimize the loss of structural information. We report on a novel method to describe a protein as a non-overlapping set of parametric three dimensional curves of varying length and complexity. Our approach to this problem is supported by information theory and uses the statistical framework of minimum message length (MML) inference. We demonstrate the effectiveness of our non-linear abstraction to support efficient and effective comparison of protein folding patterns on a large scale.


Asunto(s)
Biología Computacional/métodos , Pliegue de Proteína , Proteínas/química , Proteínas/metabolismo , Humanos , Modelos Moleculares , Dinámicas no Lineales
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda