Data Model
This page explains how resource users should model and join downloaded QUL datasets.
Quran Hierarchy
Quran -> Surah -> Ayah -> Word
Common fields by level:
- Surah:
surah_id, names, revelation type, ayah count - Ayah:
surah_id,ayah_number, text, juz/hizb/manzil context - Word:
surah_id,ayah_number,word_position, text, root/lemma/POS
Core Shared Identifiers
-
surah_id(orsurah) -
ayah_number(orayah) -
word_position(orposition)
Recommended Join Patterns
- Translations:
surah_id + ayah_number - Tafsir:
surah_id + ayah_number - Topics/themes:
surah_id + ayah_number - Morphology:
surah_id + ayah_number + word_position
Practical Integration Notes
- Keep numeric identity fields as integers for stable joins.
- Normalize by resource type (script, translation, tafsir, morphology) instead of duplicating large text fields.
- Add indexes on common join keys before shipping production workloads.
- Validate key consistency on a sample batch before full import.