NECTA Wiki Architecture

Operating Model

NECTA Wiki is a source-backed curriculum knowledge graph with a Markdown wiki surface.

The working rule is:

raw source evidence -> structured curriculum records -> graph projection -> wiki pages

Each layer has a different job:

  • raw/: private evidence archive for PDFs, exam extracts, textbooks, and source snapshots.
  • data/source_catalog.jsonl: full source registry with authority, review status, checksums, and private/public policy.
  • data/source_catalog_public.jsonl: public-safe source metadata export that excludes private raw paths.
  • data/curricula/**: versioned curriculum truth for official syllabus structure.
  • data/nodes.jsonl and data/edges.jsonl: graph projection for subjects, sources, forms, curricula, competences, hubs, topics, exams, and concepts.
  • wiki/**: human-readable pages for learners, reviewers, and future publishing.
  • exports/**: future migration outputs that can be regenerated from data/.

Markdown pages are readable views. The structured data layer is the machine-readable authority. Raw source files are audit evidence, not public runtime content.

Current Tree

necta_wiki_main/
├── data/
│   ├── curricula/
│   │   ├── index.json
│   │   └── csee/
│   │       ├── biology/2023.json
│   │       ├── chemistry/2023.json
│   │       ├── english-language/2023.json
│   │       ├── geography/2023.json
│   │       ├── historia-ya-tanzania-na-maadili/2023.json
│   │       ├── history/2023.json
│   │       ├── kiswahili/2023.json
│   │       ├── mathematics/2023.json
│   │       └── physics/2023.json
│   ├── curriculum_map.json
│   ├── source_catalog.jsonl
│   ├── source_catalog_public.jsonl
│   ├── nodes.jsonl
│   └── edges.jsonl
├── scripts/
│   ├── source_provenance.py
│   └── validate_wiki.py
├── wiki/
│   ├── subjects/
│   ├── forms/
│   ├── sources/
│   ├── topics/
│   └── exams/
└── raw/
    └── syllabuses/csee/2023/

data/curriculum_map.json remains as a Mathematics compatibility view because existing question-mapping artifacts point to the original Mathematics topic IDs. New multi-subject work should use data/curricula/{level}/{subject_slug}/{year}.json.

Versioning

Curriculum versions are append-only. A future syllabus is added as a new curriculum version instead of overwriting the old one.

Example:

curriculum-csee-physics-2023
curriculum-csee-physics-2028

data/curricula/index.json stores the current pointer:

subject-physics -> curriculum-csee-physics-2023

When a new official syllabus becomes current, update the pointer but keep the old version intact. This preserves historical exam mapping, old wiki views, and future syllabus-diff work.

Topic And Concept Identity

Official syllabus topics and reusable learning ideas are different things.

  • topic-csee-physics-2023-measurement: a topic as placed by a specific syllabus version.
  • concept-measurement: the durable learning idea that can appear across years, subjects, and exams.

Topic records answer where the official curriculum places something. Concept records answer what the idea is. Exam questions may eventually map to both.

Subject Aliases

Aliases support discovery without rewriting history.

The 2023 curriculum lists Historia ya Tanzania na Maadili rather than Civics. NECTA Wiki therefore treats Civics, Civic Education, and Uraia as legacy/search aliases that route learners toward the current subject while preserving older records that may have originally used Civics.

Public And Private Source Policy

Official PDFs are private evidence artifacts unless redistribution permission is obtained. Public launch artifacts should include:

  • wiki pages
  • curriculum JSON
  • graph nodes and edges
  • source titles, publishers, years, languages, checksums, review statuses, and citations
  • original learner-facing content

Public launch artifacts should exclude:

  • raw PDFs under raw/
  • OCR dumps
  • full extracted page text
  • local private archive paths

Use scripts/source_provenance.py to produce public-safe source metadata and to verify that local sources still hash cleanly.

Validation

Run:

python3 scripts/validate_wiki.py --warnings-as-errors
python3 scripts/source_provenance.py --only-source-type syllabus --require-local --summary-only

The validator checks JSON/JSONL parsing, source paths, duplicate graph IDs, topic page paths, and graph references. The provenance check verifies syllabus files without modifying raw sources.

Current Boundary

The core-subject expansion currently registers sources and creates curriculum shells. Detailed topic extraction is intentionally still pending for Physics, Chemistry, Biology, Geography, History, Historia ya Tanzania na Maadili, English Language, and Kiswahili.

The next implementation milestone is to extract official syllabus topic tables into versioned topic registries and create topic stub pages, with manual review for ambiguous table boundaries, duplicate Kiswahili source handling, and multilingual labels.

Curriculum Crosswalks

Curriculum versions are separate identities. Repeated topic labels across years should be connected with explicit crosswalk records rather than merged.

Use docs/curriculum-versioning-framework.md for the current relationship vocabulary and authority rules. The 2005 Basic Mathematics integration follows this model: curriculum-csee-basic-mathematics-2005 remains a legacy exam-facing curriculum, while curriculum-csee-mathematics-2023 remains the current Mathematics spine.