NECTA Wiki Project Handbook
1. What This Project Is
NECTA Wiki is a source-grounded local knowledge base for Tanzanian students following the NECTA and public school curriculum.
The long-term vision is to create a Wikipedia-like learning system for NECTA students: curriculum-aligned, searchable, locally usable, source-cited, and eventually collaborative. The current implementation is the first working foundation for that vision.
The system has two layers:
- Human layer: Markdown wiki pages for students, teachers, reviewers, and project maintainers.
- Machine layer: JSON and JSONL data files for retrieval, graph building, local models, analytics, and future applications.
The current subject pilot is CSEE Basic Mathematics.
2. Why It Exists
The project exists because students need more than scattered PDF syllabuses, unstructured past papers, and disconnected notes.
The key idea is:
- The official syllabus gives the curriculum spine.
- Past papers show what the exam system repeatedly asks.
- Exam formats show how NECTA expects assessment to be organized.
- Textbooks, teacher notes, web resources, and multimedia can later enrich each topic.
- Local models can use the structured wiki as a grounded knowledge base instead of guessing from the open internet.
This means the wiki can eventually support:
- Better topic pages.
- Smarter revision guides.
- Evidence-informed sample exams.
- Teacher review workflows.
- Human feedback loops for correcting mistakes.
- Local/offline retrieval for students and schools.
- Future fine-tuning or retrieval-augmented generation.
3. What Has Been Built So Far
Milestone 1: Foundation
Status: complete for the current starter.
Created the clean project folder necta_wiki_main with:
CLAUDE.md: operating instructions for AI agents working in the project.README.md: short project overview.docs/rulebook.md: authoring, source hierarchy, math notation, and chapter-depth standard for learner-facing pages.docs/: workflow, schema, validation, review, roadmap, and handbook docs.templates/: reusable page templates.raw/: immutable source intake area.wiki/: maintained human-readable pages.data/: structured machine-readable files.
Key rule established: files under raw/ are source artifacts and must not be modified in place.
Milestone 2: Mathematics Curriculum Spine
Status: complete for CSEE Mathematics 2023 syllabus topics.
Built the CSEE Mathematics syllabus spine from:
raw/syllabuses/csee/2023/csee_mathematics_syllabus_2023.pdf
Created:
- 44 stable syllabus topic registry entries in
data/curriculum_map.json. - 44 leaf topic pages under
wiki/topics/. - 6 topic hub pages:
number-systemsalgebra-and-matricescoordinate-geometrytrigonometryprobability-and-statisticssets-sequences-and-series- Form pages for Mathematics Form I, II, III, and IV.
- Graph records in
data/nodes.jsonlanddata/edges.jsonl.
Each syllabus topic has a stable ID, slug, page path, form, competence, source, review status, and hub grouping.
Exam JSON Inventory
Status: intake complete, extraction artifacts remain unverified.
Imported 35 Basic Mathematics exam JSON files under:
raw/exams/csee/json/basic_math/
Created:
data/question_inventory.jsonldata/review_queue.jsonlwiki/exams/basic-mathematics-exam-json-inventory.md
Important rule: exam JSON files are directional extraction artifacts. They are not official until reviewed against the original exam papers.
2022 Examination Format Layer
Status: assessment guidance integrated.
Added the 2022 CSEE examination-format booklet:
raw/syllabuses/csee/CSEE_FORMATS_2022.pdf
Created:
data/exam_format_map.jsondata/basic_math_format_topics_2022.jsondata/exam_format_topic_crosswalk_2022.jsonldata/review_queue_exam_format_2022.jsonlwiki/sources/csee-examination-formats-2022.mdwiki/exams/basic-mathematics-exam-format-2022.md
Important rule: the 2022 examination-format booklet is assessment guidance, not the curriculum authority. The 2023 Mathematics syllabus remains the curriculum authority.
Milestone 3: Five-Year Question Mapping Pilot
Status: first pilot complete for 2021-2025 CSEE Basic Mathematics Paper 1 JSON extracts.
Used exactly these source files:
csee_basic_math_2021_paper1.jsoncsee_basic_math_2022_paper1.jsoncsee_basic_math_2023_paper1.jsoncsee_basic_math_2024_paper1.jsoncsee_basic_math_2025_paper1.json
Created:
data/question_map_2021_2025.jsonldata/topic_frequency_2021_2025.jsondata/review_queue_question_mapping.jsonlwiki/exams/basic-mathematics-2021-2025-topic-signals.md
The pilot flattened 191 answerable question records, mapped 147 to syllabus topics, and left 44 unmapped. It placed 125 records in a review queue because they were low-confidence, multi-topic, figure-dependent, table-dependent, missing marks, or otherwise uncertain.
4. How The System Works
Source Intake
Sources are first placed in raw/. The project then creates a structured record for each source in data/source_catalog.jsonl and, where useful, a human-readable source summary page in wiki/sources/.
Raw sources are kept untouched so future reviewers can always return to the original file.
Curriculum Spine
The syllabus is converted into a topic registry. This gives every topic a stable identity that later data can point to.
Example topic ID pattern:
topic-logarithms
topic-exponents
topic-two-by-two-matrices-operations-determinant-inverse-and-transformations
These IDs are important because exam questions, future textbook passages, examples, diagrams, and student-facing explanations can all point to the same topic identity.
Wiki Pages
Markdown pages make the system readable to humans. They use Obsidian-style links such as:
[[logarithms]]
[[basic-mathematics-2021-2025-topic-signals]]
Each page should include:
- Summary.
- Sources.
- Last updated date.
- Review status.
- Main content.
- Related pages.
Learner-facing pages that go beyond placeholders should follow docs/rulebook.md: use chapter-level depth, separate source authority levels, and write math with portable $...$ and $$...$$ notation for Obsidian and future web rendering.
Graph Data
data/nodes.jsonl and data/edges.jsonl represent the knowledge graph.
The graph connects things like:
- Subject to level.
- Subject to forms.
- Forms to topics.
- Topics to competences.
- Topics to source documents.
- Exam-format groups to syllabus topics.
This prepares the project for graph search, local retrieval, analytics, and future applications.
Question Mapping
The Milestone 3 pilot flattens exam JSON into answerable leaf questions. Parent stems are used as context, but only answerable leaf parts become question records.
Each question record includes:
- Stable question ID.
- Year and paper.
- Section and question number.
- Part path.
- Marks, where available.
- Stem text, leaf text, and full text.
- Figure/table flags.
- Primary topic ID, where mapped.
- Secondary topic IDs, where helpful.
- Mapping confidence.
- Review status.
- Exam-format group, where mapped.
Question ID pattern:
csee_041_2024_p1_q14_b_ii
The goal is not to produce perfect classification on the first pass. The goal is to create a conservative, reviewable signal layer.
5. Why The Current Design Is Conservative
This project should not pretend to know more than its sources support.
That is why:
- Official syllabus entries can be marked
official. - Exam JSON extracts remain
unreviewed. - Question mappings are
mapped_unreviewedunless reviewed. - Ambiguous mappings go to review queues.
- Figure-dependent and table-dependent items are flagged.
- Unmapped records are preserved instead of forced into weak topics.
- Marking schemes and solutions are not generated until a separate reviewed workflow exists.
This protects the credibility of the wiki as it grows.
6. Current Data Baseline
As of 2026-05-16:
- Curriculum topic registry entries: 44.
- Syllabus leaf topic pages: 44.
- Topic hub pages: 6.
- Graph nodes: 72.
- Graph edges: 284.
- Imported Basic Mathematics exam JSON files: 35.
- Five-year mapped exam records: 191.
- Five-year mapped-to-topic records: 147.
- Five-year unmapped records: 44.
- Five-year question-mapping review queue records: 125.
The five-year topic signal currently identifies repeated emphasis around topics such as exponents, similarity and congruence, matrices, rates and variations, logarithms, sets, ratios and proportions, and approximations.
7. What The Topic Signals Can Be Used For
The topic signal page can guide:
- Evidence-informed revision plans.
- Sample exam construction.
- Topic priority ranking.
- Teacher review focus.
- Student practice sequencing.
- Future content production priorities.
It should not yet be used as final statistical truth because:
- The JSON extractions are unreviewed.
- Some questions are unmapped.
- Some questions depend on figures or tables.
- Only five recent papers are included in the pilot.
- Topic mapping has not yet received human validation.
The correct interpretation is: this is a strong first compass, not the final map.
8. Scaling Beyond Mathematics
The architecture is designed to grow beyond the current Basic Mathematics pilot into Physics, Civics, Biology, Chemistry, Geography, History, and other NECTA subjects.
For each new subject, start with the official syllabus as the curriculum spine. Create stable subject, form, competence, hub, and topic records from that source before adding exam signals, textbooks, teacher notes, web enrichment, or generated practice.
Keep one canonical wiki/topics/ page per syllabus learning node. Form pages should remain navigation maps, and cross-subject relationships should be represented as links or graph edges between canonical topic pages. Do not create duplicate topic pages just because a concept is useful in more than one subject; instead, preserve the source subject hierarchy and explain the relationship from the page that links out.
This lets Mathematics connect to Physics measurement, Biology data handling, Geography scale and graphs, Civics statistics, Chemistry formula work, and History timelines without weakening the official syllabus spine for any subject.
9. What Comes Next
Near-Term Priority 1: Review The Mapping Queue
Human review should inspect:
- Low-confidence mappings.
- Unmapped questions.
- Multi-topic questions.
- Figure-dependent questions.
- Table-dependent questions.
- Missing-mark questions.
Output should be corrected question mappings and a smaller review queue.
Near-Term Priority 2: Upgrade High-Signal Topic Pages
Start with topics that appear repeatedly in the five-year pilot:
- Exponents.
- Logarithms.
- Two-by-two matrices.
- Rates and variations.
- Similarity and congruence.
- Sets and Venn diagrams.
- Ratios and proportions.
- Approximations.
Each upgraded page should include:
- Syllabus alignment.
- Student-friendly explanation.
- Prerequisite topics.
- Worked examples.
- Common mistakes.
- Past-question signals.
- Practice tasks.
- Source citations.
Near-Term Priority 3: Build A Sample Exam Generator Plan
The project can use topic frequencies and exam-format groups to design statistically informed sample exams.
Before generating actual sample exams, define:
- Number of questions and sections.
- Topic weighting method.
- Form coverage expectations.
- Difficulty tags.
- Review status rules.
- Whether generated questions are original, adapted, or selected from past papers.
Near-Term Priority 4: Add Source-Rich Content
Future sources can include:
- Official marking schemes.
- Approved textbooks.
- Teacher guides.
- Student notes.
- Multimedia explanations.
- Web resources.
Every new source should map back to the topic registry.
Medium-Term Priority: Collaborative Review
The project should eventually support reviewer feedback:
- Teacher corrections.
- Student confusion reports.
- Pilot validation.
- Confidence upgrades.
- Versioned corrections.
This is the path from local wiki to collaborative educational infrastructure.
Long-Term Priority: Retrieval And Local Models
Once the knowledge base is richer, it can support:
- Local search.
- Local model retrieval.
- Topic-aware tutoring.
- Question recommendation.
- Content generation with citations.
- Fine-tuning experiments where appropriate.
Fine-tuning should come later than retrieval. Retrieval is safer first because it keeps answers tied to sources.
10. When To Use Each Source Type
Use the 2023 Mathematics syllabus for:
- Curriculum authority.
- Topic registry.
- Competences.
- Form placement.
- Topic-page structure.
Use the 2022 exam-format booklet for:
- Exam structure.
- Assessment guidance.
- Table-of-specification groups.
- Expected question distribution.
Use exam JSON files for:
- Question inventory.
- Topic-signal analytics.
- Review workflows.
- Draft mappings.
Use original PDFs when:
- A question, figure, table, mark, or wording needs verification.
- A JSON extraction looks suspicious.
- A mapping affects learner-facing guidance.
Use textbooks and teacher guides later for:
- Explanations.
- Examples.
- Exercises.
- Misconceptions.
- Topic enrichment.
11. Operating Rules For Future Work
- Do not modify files under
raw/. - Keep the syllabus as the curriculum spine.
- Do not treat extraction artifacts as official truth.
- Preserve uncertainty in review queues.
- Prefer stable IDs over ad hoc labels.
- Update both human pages and machine data when adding important knowledge.
- Update
wiki/index.mdandwiki/log.mdafter wiki changes. - Keep all JSON and JSONL parseable.
- Avoid generating marking schemes until a validation workflow exists.
- Retrieval should come before fine-tuning.
12. Open Questions
- Which subject should be expanded after Basic Mathematics?
- Should 1994-2020 Mathematics papers be mapped before moving to another subject?
- What is the first human review workflow: teacher review, student pilot, or internal manual audit?
- What metadata should sample exams require: difficulty, topic, marks, time, calculator use, figure/table dependency?
- What interface should students eventually use: Obsidian, static website, local app, or web platform?
- How should collaborative edits be moderated if the project becomes public?
13. Recommended Next Build Order
- Review and correct the 2021-2025 Basic Mathematics mapping queue.
- Upgrade the top high-signal topic pages with learner-facing content.
- Create a sample exam design specification.
- Map more Basic Mathematics years.
- Add official marking schemes only where source validation is possible.
- Add textbook-backed explanations and examples.
- Build a retrieval index for local model usage.
- Pilot with students and teachers.
- Convert review feedback into status upgrades.
- Decide whether and how to publish as a collaborative wiki.
Curriculum Versioning Framework
Status: established for legacy Basic Mathematics ingestion.
The project now treats old, current, and future syllabuses as separate curriculum identities. The detailed rule is in docs/curriculum-versioning-framework.md.
The first legacy implementation is curriculum-csee-basic-mathematics-2005, based on the Mathematics syllabus and kept separate from curriculum-csee-mathematics-2023.
Repeated learning ideas should be connected through crosswalks now and concept nodes later. They should not be silently merged into a single syllabus topic.