RESUMO
As one of the most morphologically conservative branches of the Sino-Tibetan language family, most of the Rgyalrongic languages are still understudied and poorly understood, not to mention their vulnerable or endangered status. It is therefore important for available data of these languages to be made accessible. The lexical data sets the authors have assembled provide comparative word lists of 20 modern and medieval Rgyalrongic languages, consisting of word lists from fieldwork carried out by the first author and other colleagues as well as published word lists by other authors. In particular, data of the two Khroskyabs varieties were collected by the first author from 2011 to 2016. Cognate identification is based on the authors' expertise in Rgyalrong historical linguistics through application of the comparative method. We curated the data by conducting phonemic segmentation and partial cognate annotation. The data sets can be used by historical linguists interested in the etymology and the phylogeny of the languages in question, and they can use them to answer questions regarding individual word histories or the subgrouping of languages in this important branch of Sino-Tibetan.
Rgyalrongic languages are mainly spoken in Western Sichuan, China, though Tangut, an extinct mediaeval language, was attested in today's Ningxia province and its surrounding regions from 1036 to 1502 AD. They are the most difficult branch of languages to learn in the Sino-Tibetan family, as they exhibit complex word formation strategies such as inflection and derivation. Their word complexity points to their historical depth in the language group and their value in exploring Sino-Tibetan language history. The database considered in this article aims at gathering lexical information of Rgyalrongic languages, as a tool for research in the field of historical and evolutionary linguistics.
RESUMO
Advances in computer-assisted linguistic research have been greatly influential in reshaping linguistic research. With the increasing availability of interconnected datasets created and curated by researchers, more and more interwoven questions can now be investigated. Such advances, however, are bringing high requirements in terms of rigorousness for preparing and curating datasets. Here we present CLICS, a Database of Cross-Linguistic Colexifications (CLICS). CLICS tackles interconnected interdisciplinary research questions about the colexification of words across semantic categories in the world's languages, and show-cases best practices for preparing data for cross-linguistic research. This is done by addressing shortcomings of an earlier version of the database, CLICS2, and by supplying an updated version with CLICS3, which massively increases the size and scope of the project. We provide tools and guidelines for this purpose and discuss insights resulting from organizing student tasks for database updates.
Assuntos
Bases de Dados Factuais , Linguística , Humanos , IdiomaRESUMO
The Sino-Tibetan language family is one of the world's largest and most prominent families, spoken by nearly 1.4 billion people. Despite the importance of the Sino-Tibetan languages, their prehistory remains controversial, with ongoing debate about when and where they originated. To shed light on this debate we develop a database of comparative linguistic data, and apply the linguistic comparative method to identify sound correspondences and establish cognates. We then use phylogenetic methods to infer the relationships among these languages and estimate the age of their origin and homeland. Our findings point to Sino-Tibetan originating with north Chinese millet farmers around 7200 B.P. and suggest a link to the late Cishan and the early Yangshao cultures.