In Quranic linguistics, a lemma is the canonical dictionary form of a word — the form you'll typically find in a lexicon. It acts as the base reference for all grammatical variations of a word such as conjugated verbs or inflected nouns.
ℹ️ Note:
Lemmas are distinct from roots. While a root captures the core triliteral structure (e.g., ر-ح-م), a lemma represents the normalized lexical form (e.g., رَحِمَ) from which conjugations are derived.
Example:
QUL exports lemma data as Sqlite database with two tables lemmas
and word_lemmas
.
lemmas
Column | Type | Description |
---|---|---|
id | INTEGER | Unique identifier for the lemma |
text | TEXT | Canonical form in Arabic with tashkeel |
text_clean | TEXT | Canonical form in Arabic without tashkeel |
words_count | INTEGER | Total occurrences of all derived words for lemma |
uniq_words_count | INTEGER | Count of unique word forms derived from this lemma |
word_lemmas
Column | Type | Description |
---|---|---|
lemma_id | INTEGER | Foreign key to the lemmas.id field |
word_location | TEXT | Word identifier (e.g. 2:3:5 ) |