A stem is the intermediate form of a word between the lemma and its inflected surface form. It typically reflects the base pattern of a word before it’s conjugated or declined. Stemming is used in linguistic processing to group words with similar forms.
ℹ️ Note:
While a lemma is a dictionary entry (e.g., رَحِمَ), a stem may be a simplified, reduced form (e.g., رَحْم) used for grouping variants. It may not always correspond to a valid Arabic word.
Stemming is useful for grouping words in search, clustering, or analyzing morphological variation without needing the full lemma or root system.
Example:
QUL exports stem data in the stems
and word_stems
tables of the SQLite dataset.
stems
Column | Type | Description |
---|---|---|
id | INTEGER | Unique identifier for the stem |
text | TEXT | Stem text with tashkeel |
text_clean | TEXT | Stem text without tashkeel |
words_count | INTEGER | Total occurrences of all derived words for stem |
uniq_words_count | INTEGER | Count of unique word forms derived from this stem |
word_stems
Column | Type | Description |
---|---|---|
stem_id | INTEGER | Foreign key to stems.id |
location | TEXT | Word location (e.g. 1:1:2 ) |