Roadmap
Milestone 1: Foundation
Status: complete for the current starter.
- Establish source-first operating rules.
- Define page templates and review statuses.
- Document ingestion, validation, and data-model expectations.
- Preserve
raw/as immutable intake.
Milestone 2: Curriculum Spine
Status: complete for CSEE Mathematics 2023 syllabus topics.
- Register all 44 current syllabus topic entries with stable IDs and slugs.
- Create one leaf wiki topic page per syllabus topic.
- Keep hub pages as navigation and grouping pages.
- Connect topics to subject, form, source, competence, and hub records in graph data.
Milestone 3: Question Mapping
Status: first five-year Basic Mathematics pilot complete.
- Flattened 2021-2025 Basic Mathematics Paper 1 JSON extracts into 191 answerable leaf-question records.
- Mapped 147 records to 2023 syllabus topic IDs.
- Preserved 44 unmapped records for review.
- Added 125 conservative review-queue records for low-confidence, multi-topic, figure/table-dependent, missing-mark, or uncertain cases.
- Used the 2022 CSEE examination-format crosswalk to compare actual exam-question patterns against expected table-of-specification groups.
- Current pilot should be treated as unreviewed exam intelligence until checked against original papers.
Milestone 4: Versioned Multi-Subject Architecture
Status: recommended next before adding more subjects.
- Introduce
data/curricula/{level}/{subject_slug}/{year}.jsonas the canonical home for official curriculum versions. - Add
data/curricula/index.jsonwith current curriculum pointers, historical versions, canonical subject IDs, subject aliases, language notes, source IDs, and review status. - Keep
data/curriculum_map.jsonas a temporary Mathematics compatibility view until existing question mappings and page generators understand the versioned layout. - Adopt append-only curriculum versioning: add future syllabuses, such as 2028, as new curriculum records instead of overwriting 2023.
- Namespace new form, competence, hub, and topic IDs by level, subject, and year to avoid collisions across subjects.
- Separate curriculum topics from durable concepts:
topic-csee-physics-2023-measurementis an official syllabus placement, whileconcept-measurementis the reusable idea. - Model legacy and multilingual names as aliases. For example, Civics aliases should support search and redirects without rewriting historical records into the current Historia ya Tanzania na Maadili identity.
Milestone 5: Provenance And Public Source Policy
Status: planned.
- Expand source records with checksum, file size, source URL where known, retrieval date, extraction date, extraction tool/version, rights note, public/private availability policy, and review status.
- Treat syllabus PDFs under
raw/as private evidence artifacts and structured curriculum files as the operational rules. - Add extraction audit records linking source IDs to extraction runs, extracted curriculum records, and review outcomes.
- Confirm a public build can ship wiki pages, structured curriculum data, graph projections, citations, and checksums without requiring
raw/**/*.pdf.
Milestone 6: Core Subject Spine Expansion
Status: planned after architecture and provenance milestones.
- Add versioned 2023 curriculum records for Physics, Chemistry, Biology, Geography, History, Historia ya Tanzania na Maadili, English Language, and Kiswahili.
- Create subject, source, form, hub, and topic stub pages from the versioned curriculum records.
- Keep the first pass to official curriculum spine stubs; do not add exam mappings or full learner chapters until the multi-subject graph validates cleanly.
- Flag duplicate sources, ambiguous topic boundaries, translation uncertainty, and syllabus-table extraction issues in review queues.
Milestone 7: Mapping Review And Topic Upgrade
Status: recommended next.
- Review
data/review_queue_question_mapping.jsonl. - Correct or confirm question-to-topic mappings.
- Reduce unmapped question count where the original PDF supports a clear mapping.
- Upgrade high-signal topic pages with learner-facing explanations, examples, common mistakes, and past-question signals.
- Start with exponents, logarithms, matrices, rates and variations, similarity and congruence, sets and Venn diagrams, ratios and proportions, and approximations.
Milestone 8: Sample Exam Design
Status: planned.
- Define a sample-exam specification using topic frequency, form coverage, hub coverage, and exam-format groups.
- Decide whether sample questions are selected, adapted, or newly generated.
- Add difficulty, marks, topic IDs, source policy, and review status to each sample question.
- Keep generated sample exams separate from official past papers.
Milestone 9: Source Enrichment
Status: planned.
- Add official marking schemes where available and source-verified.
- Add textbook-backed explanations and examples.
- Add teacher notes and multimedia resources with citations.
- Keep all new content mapped to the stable topic registry.
- Treat NECTA and official Tanzania Institute of Education sources as the constitutional layer for topic names, topic boundaries, competences, form placement, and exam coverage.
- Use non-NECTA sources only as enrichment: aliases, background context, diagrams, prerequisite hints, examples, and cross-subject bridges.
- Keep external enrichment separate from official curriculum records until reviewed.
- Add external enrichment candidates with source, confidence, rationale, and review status before promoting them into the main graph.
Milestone 10: External Knowledge Graph Enrichment
Status: planned.
- Use Wikidata as the preferred structured external graph for canonical IDs, aliases, multilingual labels, and same-as links.
- Use Wikipedia for learner-readable background context and disambiguation, with citation and licensing discipline.
- Use Google Knowledge Graph only as an optional entity-disambiguation helper, not as a bulk curriculum authority.
- Add typed enrichment edges such as
topic_has_external_alias,topic_has_external_reference,topic_has_candidate_prerequisite,topic_has_reviewed_prerequisite, andtopic_has_cross_subject_bridge. - Prevent node-edge congestion by capping weak external fanout, keeping broad hubs navigational, and promoting only reviewed high-signal links.
- Store rejected or uncertain external matches in a review queue rather than deleting the audit trail.
Milestone 11: Retrieval And Local Models
Status: planned.
- Build a local search/retrieval index over wiki pages and structured data.
- Test local model workflows with source-grounded retrieval before any fine-tuning.
- Add answer-generation rules that cite wiki pages and source files.
- Use human feedback to improve retrieval quality.
Later Milestones
- Build collaborative review workflows.
- Pilot with students and teachers.
- Add correction history and reviewer metadata.
- Explore public wiki publishing when governance is clear.