RESUMEN
A significant amount of textual data has been produced in the biomedical area recently as a result of the advancement of biomedical technologies. Large-scale biomedical data can be automatically obtained with the help of distant supervision. However, the noisy data brought by distant supervision methods makes relation extraction tasks more difficult. Previous work has focused more on how to restore mislabeled relationships, but little attention has been paid to the importance of labeled entity locations for relationship extraction tasks. In this paper, we present a "four-stage" model based on BioBERT and Multi-Instance Learning by using entity position markers. Firstly, the sentence is marked with position. Secondly, BioBERT, a biomedical pre-trained language model, is used in the final sentence feature vector representation not only with the global position marker but also with the start and end marker of both the head and tail entity. Thirdly, the aggregation of sentence vectors in the bag is used as the vector feature of the bag by three aggregation methods, and the performance of different sentence feature vectors combined with different bag encoding methods is discussed. At last, relation classification is performed at the bag level. According to experimental results, the presented model significantly outperforms all baseline models and contributes to noise reduction. In addition, different bag encoding methods need to match corresponding sentence encoding representation to achieve the best performance.