RESUMO
Biological data have gained wider recognition during the last few years, although managing and processing these data in an efficient way remains a challenge in many areas. Increasingly, more DNA sequence databases can be accessed; however, most algorithms on these sequences are performed outside of the database with different bioinformatics software. In this article, we propose a novel approach for the comparative analysis of sequences, thereby defining heuristic pairwise alignment inside the database environment. This method takes advantage of the benefits provided by the database management system and presents a way to exploit similarities in data sets to quicken the alignment algorithm. We work with the column-oriented MonetDB, and we further discuss the key benefits of this database system in relation to our proposed heuristic approach.
Assuntos
Heurística , Software , Alinhamento de Sequência , Biologia Computacional/métodos , AlgoritmosRESUMO
Community level genetic information can be essential to direct health measures and study demographic tendencies but is subject to considerable ethical and legal challenges. These concerns become less pronounced when analyzing urban sewage samples, which are ab ovo anonymous by their pooled nature. We were able to detect traces of the human mitochondrial DNA (mtDNA) in urban sewage samples and to estimate the distribution of human mtDNA haplogroups. An expectation maximization approach was used to determine mtDNA haplogroup mixture proportions for samples collected at each different geographic location. Our results show reasonable agreement with both previous studies of ancient evolution or migration and current US census data; and are also readily reproducible and highly robust. Our approach presents a promising alternative for sample collection in studies focusing on the ethnic and genetic composition of populations or diseases associated with different mtDNA haplogroups and genotypes.
Assuntos
DNA Mitocondrial/genética , Haplótipos , Esgotos , População Urbana , Evolução Molecular , Humanos , Filogenia , Análise de Componente Principal , Reprodutibilidade dos Testes , Processos EstocásticosRESUMO
Data sharing enables research communities to exchange findings and build upon the knowledge that arises from their discoveries. Areas of public and animal health as well as food safety would benefit from rapid data sharing when it comes to emergencies. However, ethical, regulatory and institutional challenges, as well as lack of suitable platforms which provide an infrastructure for data sharing in structured formats, often lead to data not being shared or at most shared in form of supplementary materials in journal publications. Here, we describe an informatics platform that includes workflows for structured data storage, managing and pre-publication sharing of pathogen sequencing data and its analysis interpretations with relevant stakeholders.