A stem is the intermediate form of a word between the lemma and its inflected surface form. It typically reflects the base pattern of a word before it’s conjugated or declined. Stemming is used in linguistic processing to group words with similar forms.
ℹ️ Note:
While a lemma is a dictionary entry (e.g., رَحِمَ), a stem may be a simplified, reduced form (e.g., رَحْم) used for grouping variants. It may not always correspond to a valid Arabic word.
Stemming is useful for grouping words in search, clustering, or analyzing morphological variation without needing the full lemma or root system.
Example:
      QUL exports stem data in the stems and word_stems tables of the SQLite dataset.
    
stems| Column | Type | Description | 
|---|---|---|
| id | INTEGER | Unique identifier for the stem | 
| text | TEXT | Stem text with tashkeel | 
| text_clean | TEXT | Stem text without tashkeel | 
| words_count | INTEGER | Total occurrences of all derived words for stem | 
| uniq_words_count | INTEGER | Count of unique word forms derived from this stem | 
word_stems| Column | Type | Description | 
|---|---|---|
| stem_id | INTEGER | Foreign key to stems.id | 
| location | TEXT | Word location (e.g. 1:1:2) |