Roadmap

Milestone 1: Foundation

Status: complete for the current starter.

  • Establish source-first operating rules.
  • Define page templates and review statuses.
  • Document ingestion, validation, and data-model expectations.
  • Preserve raw/ as immutable intake.

Milestone 2: Curriculum Spine

Status: complete for CSEE Mathematics 2023 syllabus topics.

  • Register all 44 current syllabus topic entries with stable IDs and slugs.
  • Create one leaf wiki topic page per syllabus topic.
  • Keep hub pages as navigation and grouping pages.
  • Connect topics to subject, form, source, competence, and hub records in graph data.

Milestone 3: Question Mapping

Status: first five-year Basic Mathematics pilot complete.

  • Flattened 2021-2025 Basic Mathematics Paper 1 JSON extracts into 191 answerable leaf-question records.
  • Mapped 147 records to 2023 syllabus topic IDs.
  • Preserved 44 unmapped records for review.
  • Added 125 conservative review-queue records for low-confidence, multi-topic, figure/table-dependent, missing-mark, or uncertain cases.
  • Used the 2022 CSEE examination-format crosswalk to compare actual exam-question patterns against expected table-of-specification groups.
  • Current pilot should be treated as unreviewed exam intelligence until checked against original papers.

Milestone 4: Versioned Multi-Subject Architecture

Status: recommended next before adding more subjects.

  • Introduce data/curricula/{level}/{subject_slug}/{year}.json as the canonical home for official curriculum versions.
  • Add data/curricula/index.json with current curriculum pointers, historical versions, canonical subject IDs, subject aliases, language notes, source IDs, and review status.
  • Keep data/curriculum_map.json as a temporary Mathematics compatibility view until existing question mappings and page generators understand the versioned layout.
  • Adopt append-only curriculum versioning: add future syllabuses, such as 2028, as new curriculum records instead of overwriting 2023.
  • Namespace new form, competence, hub, and topic IDs by level, subject, and year to avoid collisions across subjects.
  • Separate curriculum topics from durable concepts: topic-csee-physics-2023-measurement is an official syllabus placement, while concept-measurement is the reusable idea.
  • Model legacy and multilingual names as aliases. For example, Civics aliases should support search and redirects without rewriting historical records into the current Historia ya Tanzania na Maadili identity.

Milestone 5: Provenance And Public Source Policy

Status: planned.

  • Expand source records with checksum, file size, source URL where known, retrieval date, extraction date, extraction tool/version, rights note, public/private availability policy, and review status.
  • Treat syllabus PDFs under raw/ as private evidence artifacts and structured curriculum files as the operational rules.
  • Add extraction audit records linking source IDs to extraction runs, extracted curriculum records, and review outcomes.
  • Confirm a public build can ship wiki pages, structured curriculum data, graph projections, citations, and checksums without requiring raw/**/*.pdf.

Milestone 6: Core Subject Spine Expansion

Status: planned after architecture and provenance milestones.

  • Add versioned 2023 curriculum records for Physics, Chemistry, Biology, Geography, History, Historia ya Tanzania na Maadili, English Language, and Kiswahili.
  • Create subject, source, form, hub, and topic stub pages from the versioned curriculum records.
  • Keep the first pass to official curriculum spine stubs; do not add exam mappings or full learner chapters until the multi-subject graph validates cleanly.
  • Flag duplicate sources, ambiguous topic boundaries, translation uncertainty, and syllabus-table extraction issues in review queues.

Milestone 7: Mapping Review And Topic Upgrade

Status: recommended next.

  • Review data/review_queue_question_mapping.jsonl.
  • Correct or confirm question-to-topic mappings.
  • Reduce unmapped question count where the original PDF supports a clear mapping.
  • Upgrade high-signal topic pages with learner-facing explanations, examples, common mistakes, and past-question signals.
  • Start with exponents, logarithms, matrices, rates and variations, similarity and congruence, sets and Venn diagrams, ratios and proportions, and approximations.

Milestone 8: Sample Exam Design

Status: planned.

  • Define a sample-exam specification using topic frequency, form coverage, hub coverage, and exam-format groups.
  • Decide whether sample questions are selected, adapted, or newly generated.
  • Add difficulty, marks, topic IDs, source policy, and review status to each sample question.
  • Keep generated sample exams separate from official past papers.

Milestone 9: Source Enrichment

Status: planned.

  • Add official marking schemes where available and source-verified.
  • Add textbook-backed explanations and examples.
  • Add teacher notes and multimedia resources with citations.
  • Keep all new content mapped to the stable topic registry.
  • Treat NECTA and official Tanzania Institute of Education sources as the constitutional layer for topic names, topic boundaries, competences, form placement, and exam coverage.
  • Use non-NECTA sources only as enrichment: aliases, background context, diagrams, prerequisite hints, examples, and cross-subject bridges.
  • Keep external enrichment separate from official curriculum records until reviewed.
  • Add external enrichment candidates with source, confidence, rationale, and review status before promoting them into the main graph.

Milestone 10: External Knowledge Graph Enrichment

Status: planned.

  • Use Wikidata as the preferred structured external graph for canonical IDs, aliases, multilingual labels, and same-as links.
  • Use Wikipedia for learner-readable background context and disambiguation, with citation and licensing discipline.
  • Use Google Knowledge Graph only as an optional entity-disambiguation helper, not as a bulk curriculum authority.
  • Add typed enrichment edges such as topic_has_external_alias, topic_has_external_reference, topic_has_candidate_prerequisite, topic_has_reviewed_prerequisite, and topic_has_cross_subject_bridge.
  • Prevent node-edge congestion by capping weak external fanout, keeping broad hubs navigational, and promoting only reviewed high-signal links.
  • Store rejected or uncertain external matches in a review queue rather than deleting the audit trail.

Milestone 11: Retrieval And Local Models

Status: planned.

  • Build a local search/retrieval index over wiki pages and structured data.
  • Test local model workflows with source-grounded retrieval before any fine-tuning.
  • Add answer-generation rules that cite wiki pages and source files.
  • Use human feedback to improve retrieval quality.

Later Milestones

  • Build collaborative review workflows.
  • Pilot with students and teachers.
  • Add correction history and reviewer metadata.
  • Explore public wiki publishing when governance is clear.