Your browser doesn't support javascript.
loading
SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks.
Tomova, Mihaela Todorova; Hofmann, Martin; Mäder, Patrick.
Afiliação
  • Tomova MT; Technische Universität Ilmenau, Ilmenau 98693, Germany.
  • Hofmann M; Technische Universität Ilmenau, Ilmenau 98693, Germany.
  • Mäder P; Technische Universität Ilmenau, Ilmenau 98693, Germany.
Data Brief ; 42: 108211, 2022 Jun.
Article em En | MEDLINE | ID: mdl-35539028
Stakeholders of software development projects have various information needs for making rational decisions during their daily work. Satisfying these needs requires substantial knowledge of where and how the relevant information is stored and consumes valuable time that is often not available. Easing the need for this knowledge is an ideal text-to-SQL benchmark problem, a field where public datasets are scarce and needed. We propose the SEOSS-Queries dataset consisting of natural language utterances and accompanying SQL queries extracted from previous studies, software projects, issue tracking tools, and through expert surveys to cover a large variety of information need perspectives. Our dataset consists of 1,162 English utterances translating into 166 SQL queries; each query has four precise utterances and three more general ones. Furthermore, the dataset contains 393,086 labeled utterances extracted from issue tracker comments. We provide pre-trained SQLNet and RatSQL baseline models for benchmark comparisons, a replication package facilitating a seamless application, and discuss various other tasks that may be solved and evaluated using the dataset. The whole dataset with paraphrased natural language utterances and SQL queries is hosted at figshare.com/s/75ed49ef01ac2f83b3e2.
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2022 Tipo de documento: Article