Data Model
Core Node Types
sourcesubjectlevelcurriculumformcompetencetopic_hubtopicexam_formatconceptexam_paperexam_questionmarking_schemesolution
Versioned Curriculum Files
The long-term curriculum source of truth should be versioned by level, subject, and syllabus year:
data/curricula/{level}/{subject_slug}/{year}.json: one official curriculum version for one subject.data/curricula/index.json: aggregate registry for discoverability, current pointers, historical versions, aliases, and review status.data/curriculum_map.json: current Mathematics compatibility view. Do not grow this file into the multi-subject authority.
Example paths:
data/curricula/csee/mathematics/2023.jsondata/curricula/csee/physics/2023.jsondata/curricula/csee/kiswahili/2023.json
Curriculum versions are append-only. When a new syllabus arrives, add a new {year}.json record and update the current pointer in index.json; do not overwrite the older version. This keeps old exam mappings, archived wiki pages, and syllabus-change comparisons historically honest.
Curriculum Version Shape
Each data/curricula/{level}/{subject_slug}/{year}.json file should include:
id: Stable version ID, for examplecurriculum-csee-physics-2023.level: Level slug, for examplecsee.subject_idandsubject_slug: Canonical subject identity.title: Official curriculum title.year: Official syllabus year as a number.language: Primary language used by the syllabus record.source_ids: Source records that define the curriculum.review_statusandextraction_status.forms: Version-scoped forms, with IDs likeform-csee-physics-2023-form-i.competences: Version-scoped competence records.topic_hubs: Navigation groupings for the subject version.topic_registry: Official syllabus topic records for that subject version.
Version-scoped IDs should include level, subject, and year unless a node is deliberately timeless. This prevents collisions across subjects and protects older curriculum meaning when new syllabuses arrive.
Curriculum Index Shape
data/curricula/index.json should include:
schema_version.current_curricula: Pointers from canonical subject IDs to the active curriculum version, for examplesubject-physics -> curriculum-csee-physics-2023.subjects: Canonical subject records, aliases, language notes, and historical labels.curricula: Known curriculum versions with path, year, source IDs, review status, and public/private availability policy.legacy_aliases: Search and migration aliases that should not rewrite old records.
Subject aliases support discovery and continuity, not identity replacement. For example, Civics, Civic Education, and Uraia may route learners toward the current subject-historia-ya-tanzania-na-maadili page, while older records that were originally Civics should remain historically labeled.
Topic And Concept Identity
Use separate node types for official syllabus topics and durable learning concepts:
topic: A curriculum-specific syllabus item in one subject and year, for exampletopic-csee-physics-2023-measurement.concept: A reusable learning idea that can span subjects, years, languages, and exams, for exampleconcept-measurement.
A topic answers: "Where does the official syllabus place this item?" A concept answers: "What learning idea is this about?" Exam questions may eventually map to both layers:
question_assesses_topicfor a specific syllabus-version target.question_assesses_conceptfor a reusable learning idea.
Do not use unversioned topic-* IDs for new multi-subject records. Existing Mathematics IDs may remain through compatibility aliases or migration mappings until downstream question maps are upgraded.
Legacy Curriculum Topic Registry
data/curriculum_map.json.topic_registry is the current Mathematics compatibility registry. New multi-subject curriculum records should use data/curricula/{level}/{subject_slug}/{year}.json.topic_registry with version-scoped IDs.
Each topic-registry entry must include:
id: Stable topic ID, prefixed withtopic-.slug: Stable page slug, lowercase with hyphens.title: Official syllabus topic title.formandform_id: Human and graph identifiers for the form.competenceandcompetence_id: Syllabus competence mapping.sourceandsource_id: Official source path and graph ID.review_status: Conservative review status. Official syllabus topics useofficial.page_path: Markdown page path underwiki/topics/.hub_categoryandhub_page_path: Hub grouping used for navigation.sequence: Stable ordering within the curriculum spine.summary: Short scope note for the page.
The canonical learning node for a syllabus topic is its wiki/topics/ page. Form pages, subject pages, and hub pages are navigation maps: they organize the official sequence, readiness signals, and recommended next pages, but they should point back to canonical topic pages rather than duplicating topic content.
Form/Class Metadata
Entries in data/curriculum_map.json.form_topics may include learner-navigation metadata in addition to the official form and topic list:
aliases: Common names that refer to the same official form/class level.page_path: Human-readable form/class learning-map page.readiness.chapter_ready_topics: Topic IDs with chapter-level learner pages.readiness.recommended_next_topics: Topic IDs that naturally follow current chapter-ready pages.readiness.exam_signal_topics: Topic IDs with useful but still reviewable exam signals.
Official syllabus naming remains authoritative. Aliases support search, retrieval, and future tutor personalization.
Core Edge Types
source_supports_pagesource_supports_curriculumsource_supports_topicsubject_at_levelsubject_has_formsubject_has_competencecompetence_has_specific_competencesubject_has_topic_hubsubject_has_topicform_has_topictopic_in_hubtopic_supports_competencetopic_has_concepttopic_has_learning_activityform_has_aliasform_has_readiness_signalsource_defines_exam_formatexam_format_applies_to_subjectexam_format_maps_to_topicexam_format_partially_maps_to_topicexam_paper_has_questionquestion_assesses_topicquestion_assesses_conceptsolution_answers_questionmarking_scheme_validates_solution
Recommended future curriculum-version edge types:
curriculum_replaces_curriculum: a newer curriculum version is the active successor to an older version.topic_supersedes_topic: a newer syllabus topic replaces an older syllabus topic.topic_same_as: two curriculum-version topics represent the same official learning target.topic_split_into: one older topic becomes multiple newer topics.topic_merged_into: multiple older topics become one newer topic.topic_moved_form: a topic remains recognizable but shifts form placement.topic_equivalent_to_concept: a curriculum topic maps to a durable concept node.
Recommended future topic-to-topic edge types:
topic_prerequisite_of: one canonical topic supports later learning in another canonical topic.topic_related_to: two canonical topics are useful siblings, extensions, or comparisons without a strict prerequisite relationship.topic_cross_subject_bridge: a canonical topic in one subject helps explain, apply, or interpret a canonical topic in another subject.
Use these edges between canonical topic nodes. Do not model form pages as the learning target for prerequisite, related-topic, or cross-subject relationships.
Review Status Values
officialunreviewedai_checkedhuman_reviewedpilot_validatedneeds_manual_review
Extraction Status Values
not_extractedvalid_extractionschema_invalidempty_sectionsparse_errorruntime_errorneeds_manual_review
Assessment Format Files
exam_format_map.json: machine-readable examination rubric.basic_math_format_topics_2022.json: Basic Mathematics content topics and table-of-specification groups from the 2022 format booklet.exam_format_topic_crosswalk_2022.jsonl: mappings from 2022 examination-format topic groups to current 2023 Mathematics topic IDs.review_queue_exam_format_2022.jsonl: crosswalk terms that need manual review because they do not cleanly match the current syllabus topic registry.
Question Mapping Files
question_map_2021_2025.jsonl: one flattened answerable leaf-question record per 2021-2025 Basic Mathematics Paper 1 JSON item.topic_frequency_2021_2025.json: aggregate topic, form, hub, and exam-format signals from the five-year mapping pilot.exponents_exam_signal_audit_2021_2025.json: focused review artifact correcting the Exponents learner-page signal after excluding degree-notation false positives.review_queue_question_mapping.jsonl: question mappings that need manual review because they are low-confidence, multi-topic, figure/table-dependent, missing marks, unmapped, or otherwise uncertain.
Each question-map record should include:
question_id: Stable question identifier, for examplecsee_041_2024_p1_q14_b_ii.exam_year,level,subject,subject_code, andpaper.source_fileandsource_pdf.section_title,section_name,question_number,part_path, andpart_label.top_marksandleaf_marks, when available.stem_text,leaf_text, andfull_text.tables,figures,has_table, andhas_figure.primary_topic_id,primary_topic_label,secondary_topic_ids,form, andhub.mapping_confidence,mapping_status,review_status, andmapping_notes.exam_format_group_idandexam_format_group_label, where mappable.review_reasons.
Mapping Confidence Values
0.90: direct wording match, such as logarithm, matrix, round off, or Venn diagram.0.75: clear mathematical skill match even when wording differs.0.55: likely mapping but context-dependent or multi-topic.- Below
0.55: leave unmapped or tentative and send to review.
Any figure-dependent or table-dependent question should appear in the review queue even when it has a plausible topic mapping.
Curriculum Crosswalk Records
Crosswalk data lives under data/curricula/crosswalks/ and compares curriculum-version topics without rewriting either side.
Each crosswalk should include source curriculum ID, target curriculum ID, source IDs, review status, and relationship records with from_topic_id, optional to_topic_id, relationship_type, review_status, and mapping_note.
Allowed relationship types are documented in docs/curriculum-versioning-framework.md: same_or_near_same, partial_overlap, split_into, merged_into, moved_form, renamed_from, legacy_only, and new_only.