RESUMO
Correct pre-mRNA processing in higher eukaryotes vastly depends on splice site recognition. Beyond conserved 5'ss and 3'ss motifs, splicing regulatory elements (SREs) play a pivotal role in this recognition process. Here, we present in silico designed sequences with arbitrary a priori prescribed splicing regulatory HEXplorer properties that can be concatenated to arbitrary length without changing their regulatory properties. We experimentally validated in silico predictions in a massively parallel splicing reporter assay on more than 3000 sequences and exemplarily identified some SRE binding proteins. Aiming at a unified 'functional splice site strength' encompassing both U1 snRNA complementarity and impact from neighboring SREs, we developed a novel RNA-seq based 5'ss usage landscape, mapping the competition of pairs of high confidence 5'ss and neighboring exonic GT sites along HBond and HEXplorer score coordinate axes on human fibroblast and endothelium transcriptome datasets. These RNA-seq data served as basis for a logistic 5'ss usage prediction model, which greatly improved discrimination between strong but unused exonic GT sites and annotated highly used 5'ss. Our 5'ss usage landscape offers a unified view on 5'ss and SRE neighborhood impact on splice site recognition, and may contribute to improved mutation assessment in human genetics.