Project:Linguistic entities
Identifying the language of a name or form
Approaches:
- When it comes to properties, Wikibase allows for both a generic, language-neutral 'String' and 'Monolingual Text'. Unlike the former, a monolingual string must have a single language label (e.g. @ga for Irish) applied to it. The choice of labels is restricted to a predefined set of language codes (https://www.wikidata.org/wiki/Help:Monolingual_text_languages), including a couple of special ones:
- und: "For content whose language is not yet determined (undetermined)"
- mis: "For content whose language is known, but has no language code (uncoded languages). We also use it for content whose language has a language code that is not yet available on Wikidata.org."
- mul: "For content in multiple languages (multiple languages), meaning either content that is the same in more than one language, or content that contains more than one language, so a reader would need to know all of them to understand it."
- zxx:" For content that is not linguistic (no linguistic content, not applicable)"
- If the use of a monolingual string proves to be too restrictive, there is another way to accomplish it. Wikibase also allows for qualifiers to be associated with a property.
- Do both? The limitation of 'Monolingual Text' is that on its own, (a) it does not let you distinguish between historical stages of a language (e.g. Old/Middle Welsh) and (b) it does not have not have codes for an older, ancestral language or reconstructed langage (e.g. there is no Brittonic or Proto-Indo-European). Like any other property, however, it does support qualifiers, which is the approach recommended by Wikidata.
- Propositions for new language codes can be submitted on Phabricator.
Lexical items
Lexemes
The Wikibase Lexeme extension was created so that users do not have to reinvent the wheel using custom items and custom properties for lexical entities such as words and phrases - see mw:Extension:WikibaseLexeme/Data Model. It is not enabled on Wikibase Cloud by default, but can be switched on.
Name group
The name group is intended to refer to a loose grouping of name variants. Such variation is most often linguistic - diachronic, synchronic (e.g. dialectal), through secondary formation such as hypocoristic forms, etc. - or orthographic, but can also be caused by other factors, such as error and confusion. Names that are not linguistically related can become associated with each other, etc.
There is a simple practical reason for this. It would be too much work to gather all the possible aliases for each single person. Linking the person to a name group seems more feasible.
Example:
- Áed, Aodh
- Laisrén, Mo Laisse, Molaise
Properties to be used
- instance of
- name category: e.g. first name, last name, place name, etc.
Properties to be used, eventually:
- Has regular form - monolingual, with qualifiers?
- Has hypocoristic form (al. diminutive, pet form) - monolingual, with qualifiers?
- ?has orthographic variants
- ?has form rendered in another language, e.g. anglicised, latinised
For place names, the use of headwords may be converted to refer to name groups, too.