RESUMO
MOTIVATION: Beyond identifying genetic variants, we introduce a set of Boolean relations, which allows for a comprehensive classification of the relations of every pair of variants by taking all minimal alignments into account. We present an efficient algorithm to compute these relations, including a novel way of efficiently computing all minimal alignments within the best theoretical complexity bounds. RESULTS: We show that these relations are common, and many non-trivial, for variants of the CFTR gene in dbSNP. Ultimately, we present an approach for the storing and indexing of variants in the context of a database that enables efficient querying for all these relations. AVAILABILITY AND IMPLEMENTATION: A Python implementation is available at https://github.com/mutalyzer/algebra/tree/v0.2.0 as well as an interface at https://mutalyzer.nl/algebra.
Assuntos
Algoritmos , Gerenciamento de Dados , Bases de Dados Factuais , SoftwareRESUMO
MOTIVATION: Unambiguous variant descriptions are of utmost importance in clinical genetic diagnostics, scientific literature and genetic databases. The Human Genome Variation Society (HGVS) publishes a comprehensive set of guidelines on how variants should be correctly and unambiguously described. We present the implementation of the Mutalyzer 2 tool suite, designed to automatically apply the HGVS guidelines so users do not have to deal with the HGVS intricacies explicitly to check and correct their variant descriptions. RESULTS: Mutalyzer is profusely used by the community, having processed over 133 million descriptions since its launch. Over a five year period, Mutalyzer reported a correct input in â¼50% of cases. In 41% of the cases either a syntactic or semantic error was identified and for â¼7% of cases, Mutalyzer was able to automatically correct the description. AVAILABILITY AND IMPLEMENTATION: Mutalyzer is an Open Source project under the GNU Affero General Public License. The source code is available on GitHub (https://github.com/mutalyzer/mutalyzer) and a running instance is available at: https://mutalyzer.nl.
Assuntos
Variação Genética , Software , Humanos , Genoma HumanoRESUMO
MOTIVATION: Unambiguous sequence variant descriptions are important in reporting the outcome of clinical diagnostic DNA tests. The standard nomenclature of the Human Genome Variation Society (HGVS) describes the observed variant sequence relative to a given reference sequence. We propose an efficient algorithm for the extraction of HGVS descriptions from two sequences with three main requirements in mind: minimizing the length of the resulting descriptions, minimizing the computation time and keeping the unambiguous descriptions biologically meaningful. RESULTS: Our algorithm is able to compute the HGVS descriptions of complete chromosomes or other large DNA strings in a reasonable amount of computation time and its resulting descriptions are relatively small. Additional applications include updating of gene variant database contents and reference sequence liftovers. AVAILABILITY: The algorithm is accessible as an experimental service in the Mutalyzer program suite (https://mutalyzer.nl). The C++ source code and Python interface are accessible at: https://github.com/mutalyzer/description-extractor. CONTACT: j.k.vis@lumc.nl.