RESUMO
Innovation in sequencing instrumentation is increasing the per-batch data volumes and decreasing the per-base costs. Multiplexed chemistry protocols after the addition of index tags have further contributed to efficient and cost-effective sequencer utilization. With these pooled processing strategies, however, comes an increased risk of sample contamination. Sample contamination poses a risk of missing critical variants in a patient sample or wrongly reporting variants derived from the contaminant, which are particularly relevant issues in oncology specimen testing in which low variant allele frequencies have clinical relevance. Small custom-targeted next-generation sequencing (NGS) panels yield limited variants and pose challenges in delineating true somatic variants versus contamination calls. A number of popular contamination identification tools have the ability to perform well in whole-genome/exome sequencing data; however, in smaller gene panels, there are fewer variant candidates for the tools to perform accurately. To prevent clinical reporting of potentially contaminated samples in small next-generation sequencing panels, we have developed MICon (Microhaplotype Contamination detection), a novel contamination detection model that uses microhaplotype site variant allele frequencies. In a heterogeneous hold-out test cohort of 210 samples, the model displayed state-of-the-art performance with an area under the receiver-operating characteristic curve of 0.995.