RESUMEN
Computer-assisted approaches to historical language comparison have made great progress during the past two decades. Scholars can now routinely use computational tools to annotate cognate sets, align words, and search for regularly recurring sound correspondences. However, computational approaches still suffer from a very rigid sequence model of the form part of the linguistic sign, in which words and morphemes are segmented into fixed sound units which cannot be modified. In order to bring the representation of sound sequences in computational historical linguistics closer to the research practice of scholars who apply the traditional comparative method, we introduce improved sound sequence representations in which individual sound segments can be grouped into evolving sound units in order to capture language-specific sound laws more efficiently. We illustrate the usefulness of this enhanced representation of sound sequences in concrete examples and complement it by providing a small software library that allows scholars to convert their data from forms segmented into sound units to forms segmented into evolving sound units and vice versa.
In linguistics, it is difficult to clearly draw the boundaries between the sounds in individual words. What one linguist may analyze as two sounds, another linguist might analyze at just one sound. Since the segmentation of words into sounds is crucial for many analyses in linguistics and since no perfect solution can be found, we offer a new representation that allows scholars to analyze the sounds in a word in a more flexible way that conforms to general standards while at the same time giving linguists enough flexibility to advance individual analyses.