RESUMO
BACKGROUND: In recent years, advances in high-throughput sequencing technologies have enabled the use of genomic information in many fields, such as precision medicine, oncology, and food quality control. The amount of genomic data being generated is growing rapidly and is expected to soon surpass the amount of video data. The majority of sequencing experiments, such as genome-wide association studies, have the goal of identifying variations in the gene sequence to better understand phenotypic variations. We present a novel approach for compressing gene sequence variations with random access capability: the Genomic Variant Codec (GVC). We use techniques such as binarization, joint row- and column-wise sorting of blocks of variations, as well as the image compression standard JBIG for efficient entropy coding. RESULTS: Our results show that GVC provides the best trade-off between compression and random access compared to the state of the art: it reduces the genotype information size from 758 GiB down to 890 MiB on the publicly available 1000 Genomes Project (phase 3) data, which is 21% less than the state of the art in random-access capable methods. CONCLUSIONS: By providing the best results in terms of combined random access and compression, GVC facilitates the efficient storage of large collections of gene sequence variations. In particular, the random access capability of GVC enables seamless remote data access and application integration. The software is open source and available at https://github.com/sXperfect/gvc/ .
Assuntos
Compressão de Dados , Compressão de Dados/métodos , Algoritmos , Estudo de Associação Genômica Ampla , Genômica/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodosRESUMO
For many orthopaedic, neurological, and oncological applications, an exact segmentation of the vertebral column including an identification of each vertebra is essential. However, although bony structures show high contrast in CT images, the segmentation and labelling of individual vertebrae is challenging. In this paper, we present a comprehensive solution for automatically detecting, identifying, and segmenting vertebrae in CT images. A framework has been designed that takes an arbitrary CT image, e.g., head-neck, thorax, lumbar, or whole spine, as input and provides a segmentation in form of labelled triangulated vertebra surface models. In order to obtain a robust processing chain, profound prior knowledge is applied through the use of various kinds of models covering shape, gradient, and appearance information. The framework has been tested on 64 CT images even including pathologies. In 56 cases, it was successfully applied resulting in a final mean point-to-surface segmentation error of 1.12+/-1.04mm. One key issue is a reliable identification of vertebrae. For a single vertebra, we achieve an identification success of more than 70%. Increasing the number of available vertebrae leads to an increase in the identification rate reaching 100% if 16 or more vertebrae are shown in the image.