Validation
Required Checks
Run the focused validation harness from the project root:
python3 scripts/validate_wiki.py
The script currently checks:
- Every
data/**/*.jsonfile parses as JSON. - Every
data/**/*.jsonlline parses as a JSON object. - Duplicate IDs in
data/nodes.jsonl,data/edges.jsonlwhen edge IDs exist,data/source_catalog.jsonl, and curriculum topic registries. - Duplicate graph edge triples in
data/edges.jsonl. - Graph edge
fromandtoreferences resolve to known node IDs. - Source catalog local paths stay inside the repo root. Use
--require-source-filesfor private archive audits that require every local source file to exist. - Topic page paths exist for
data/curriculum_map.jsonand any futuredata/curricula/**/*.jsonfiles. - Curriculum topic references to form, source, competence, and hub nodes resolve when those fields are present.
The command exits with status 1 if errors are found and 0 when only warnings or a clean result remain.
The raw source archive is intentionally not version-tracked. Use scripts/source_provenance.py --require-local when you need to verify that the local private archive is present and hashable.
Run these checks after Milestone 1 or 2 changes:
data/curriculum_map.jsonparses as JSON.- Every line in
data/nodes.jsonlanddata/edges.jsonlparses as JSON. - Every topic registry entry has
id,slug,form,competence_id,source_id,review_status,page_path, andhub_category. - Every topic registry
page_pathexists. - Every topic node appears in
data/nodes.jsonl. - Every topic has edges for subject, form, source, competence, and hub connections.
- No raw files under
raw/are modified during wiki maintenance.
Run these checks after question-mapping changes:
- Every line in
data/question_map_2021_2025.jsonlparses as JSON. - Every line in
data/review_queue_question_mapping.jsonlparses as JSON. data/topic_frequency_2021_2025.jsonparses as JSON.- The expected source files and only the expected source files are used.
- Every
question_idis unique. - Every mapped
primary_topic_idexists indata/curriculum_map.json.topic_registry. - Every mapped
secondary_topic_idsentry exists indata/curriculum_map.json.topic_registry. - Every mapped
exam_format_group_idexists indata/exam_format_topic_crosswalk_2022.jsonl. - Frequency totals match the question-map record count.
- Low-confidence, figure-dependent, and table-dependent records appear in the review queue when present.
- Wiki links resolve after adding or renaming pages.
Run these additional checks after 2018-2025 legacy Basic Mathematics mapping changes:
- Every mapped legacy
primary_topic_idresolves todata/curricula/csee/basic-mathematics/2005.json. - Every mapped legacy
secondary_topic_idsentry resolves todata/curricula/csee/basic-mathematics/2005.json. - Any 2023 Mathematics target is explicitly crosswalk-derived from
data/curricula/crosswalks/csee-basic-mathematics-2005-to-mathematics-2023.json. - Legacy-only, partial-overlap, missing-text, figure-dependent, table-dependent, and low-confidence records remain reviewable instead of being forced into the 2023 topic registry.
- Topic-frequency summaries state whether they are legacy-2005 counts, current-2023 counts, or crosswalk-derived counts.
- Learner-facing pages label these records as unreviewed assessment signals unless the original paper and marking guidance have been manually checked.
- No generated solution, answer, marking scheme, or official emphasis claim is inferred from the unreviewed mapping layer.
Run these checks after learner-facing topic expansion:
- Topic page follows
docs/rulebook.mdchapter shape. - Learner-facing mathematics uses
$...$for inline math and$$...$$for display math. - Code formatting is reserved for IDs, paths, literal source strings, or extraction artifacts.
- Official syllabus content, unreviewed exam signals, open enrichment, and textbook notes remain clearly separated.
- Wikipedia and textbook wording is not copied into learner prose.
- Practice tasks progress from direct understanding to application or edge cases.
- Renderer limitations are noted when math may not display in a given Markdown surface.
Run these checks after form/class page changes:
- Form/class page links to every official topic listed for that form in
data/curriculum_map.json.form_topics. - Aliases are recorded without changing the official syllabus form name.
- Any readiness topic IDs resolve to existing
topic_registryentries. - Form/class page distinguishes navigation status from curriculum authority.
- Future tutor notes do not imply that a personalized tutor has already been built.
Current Baseline
- Curriculum topics: 44.
- Syllabus topic pages: 44.
- Topic hubs: 6.
- Graph nodes: 72.
- Graph edges: 284.
- Imported Basic Mathematics exam JSON files: 35.
- Five-year Basic Mathematics question-map records: 191.
- Five-year mapped records: 147.
- Five-year unmapped records: 44.
- Five-year question-mapping review records: 125.
Milestone Boundary
Milestones 1 and 2 stop at curriculum registry, page scaffolding, navigation, and graph connectivity. Milestone 3 adds unreviewed question-to-topic signals. Worked solutions and marking schemes remain outside the current milestone boundary.