RESUMO
OBJECTIVE: Increasing use of retroviral vector-mediated gene transfer created intense interest to characterize vector integrations on the genomic level. Techniques to determine insertion sites, mainly based on time-consuming manual data processing, are commonly applied. Since a high variability in processing methods hampers further data comparison, there is an urgent need to systematically process the data arising from such analysis. METHODS: To allow large-scale and standardized comparison of insertion sites of viral vectors we developed two programs, IntegrationSeq and IntegrationMap. IntegrationSeq can trim sequences, and valid integration sequences get further processed with IntegrationMap for automatic genomic mapping. IntegrationMap retrieves detailed information about whether integrations are located in or close to genes, the name of the gene, the exact localization in the transcriptional units, and further parameters like the distance from the transcription start site to the integration. RESULTS: We validated the method using 259 files originating from integration site analysis (LM-PCR). Sequences processed by IntegrationSeq led to an increased yield of valid integration sequence detection, which were shown to be more sensitive than conventional analysis and 15 times faster, while the specificities are equal. Output files generated by IntegrationMap were found to be 99.8% identical with results retrieved by much slower conventional mapping with the ENSEMBL alignment tool. CONCLUSION: Using IntegrationSeq and IntegrationMap, a validated, fast and standardized high-throughput analysis of insertion sites can be achieved for the first time.