+ learner first aid

First aid: read the overview, copy one worked example by hand, then try explaining the key rule without looking.

+ Math syllabus context

Current Mathematics path is the active Basic Mathematics syllabus. The 2023 Mathematics syllabus is a transition path expected to take effect from January 2027; this wiki will update the lead path in late 2026.

Statistics (Form III)

Core Concepts

Statistics is the branch of mathematics dealing with the collection, presentation, analysis, and interpretation of numerical data. In the NECTA Basic Mathematics syllabus, mastering how to organize raw data into tables, graphically visualize it, and extract Measures of Central Tendency (Mean, Median, Mode) is critical.

1. Simple Data Representation

When data sets are relatively small or categorical, we use simple graphical tools. The fundamental principle is to visually represent the frequency (the number of times an observation occurs) of each category.

  • Pictograms: Data is represented using symbols or pictures. A key is provided to show what quantity each symbol represents. While intuitive, they can be visually imprecise when depicting fractions of a symbol.
  • Bar Charts: Frequencies are shown using rectangular bars. The height (or length) of each bar is directly proportional to its frequency. Bars are separated by equal gaps, emphasizing that the categories are distinct or discrete.
  • Line Graphs: Data points $(x, y)$ are plotted on a Cartesian plane and connected by straight lines. They are mostly used for time-series data to show trends over a continuous period.
  • Pie Charts: A circular chart divided into sectors. The entire circle represents the total frequency ($360^\circ$). The area (and thus the central angle) of each sector is proportional to the frequency of that category.

Derivation of the Sector Angle: If the total frequency $\sum f$ corresponds to $360^\circ$, then $1$ unit of frequency corresponds to $\frac{360^\circ}{\sum f}$. For a category with frequency $f_i$, the angle $\theta_i$ is: $$\theta_i = \frac{f_i}{\sum f} \times 360^\circ$$

2. Frequency Distribution Tables

For large sets of continuous data, we group raw data into class intervals to form a grouped frequency distribution table. This condenses the data, making it easier to analyze.

  • Class Limits vs. Class Boundaries: Class limits are the exact smallest and largest values grouped in a class (e.g., $65 - 69$). However, this leaves a gap between classes (from $69$ to $70$). To create a continuous scale, we define class boundaries. We subtract $0.5$ from the lower limit to get the lower class boundary ($L$), and add $0.5$ to the upper limit to get the upper class boundary ($U$). For the $65-69$ class, the boundaries are $64.5 - 69.5$.
  • Class Mark / Midpoint ($x$): The center of the class interval, acting as a mathematical representative for all data points in that class.
  • $$x = \frac{\text{Lower Limit} + \text{Upper Limit}}{2}$$

  • Class Width / Size ($i$ or $c$): The difference between the upper and lower boundaries (not limits).
  • $$i = U - L$$

Measures of Central Tendency

  • Mean ($\bar{x}$): The arithmetic average. For grouped data, we use the midpoints ($x$):
  • $$\bar{x} = \frac{\sum fx}{\sum f}$$ Assumed Mean Method: To simplify large calculations, we can choose an Assumed Mean ($A$), usually the midpoint of the central class, and find deviations $d = x - A$. $$\bar{x} = A + \frac{\sum fd}{\sum f}$$

  • Median: The central value when data is sorted. For grouped data, we locate the median class where the cumulative frequency reaches $\frac{N}{2}$ (where $N = \sum f$). We then interpolate to find the exact median:
  • $$\text{Median} = L + \left( \frac{\frac{N}{2} - n_b}{n_w} \right) i$$ Derivation Intuition: We start at the lower boundary of the median class ($L$). We need to "step into" the class by a certain number of data points to reach the exact middle position ($\frac{N}{2}$). We have already accumulated $n_b$ points in the previous classes, so we need $\frac{N}{2} - n_b$ more points. Since there are $n_w$ total points evenly spread across the class width ($i$), the fraction of the class width we must add to $L$ is exactly $\frac{\frac{N}{2} - n_b}{n_w} \times i$.

3. Graphical Representation of Grouped Data

  • Histograms: A chart consisting of adjacent rectangles. The horizontal axis represents Class Boundaries (ensuring no gaps) and the vertical axis represents Frequency (for equal class widths). The Area of each bar represents the frequency.
    • Estimating the Mode: Draw diagonal lines from the top corners of the highest bar (modal class) to the adjacent top corners of the neighboring bars. Drop a perpendicular line from their intersection to the x-axis to read the Mode.
  • Frequency Polygons: A line graph formed by plotting Frequencies against the Class Marks (Midpoints). It is closed by connecting the ends to the x-axis at the midpoints of imaginary classes with zero frequency on either side.
  • Cumulative Frequency Curves (Ogives): A smooth curve showing the running total of frequencies.
    • Plotted using Upper Class Boundaries on the x-axis and Cumulative Frequencies on the y-axis. (Intuition: by the time you reach the upper boundary of a class, you have accumulated all the data points up to that value).
    • Used to estimate the Median (at $\frac{N}{2}$ on the y-axis) and Quartiles (Lower $Q_1$ at $\frac{N}{4}$, Upper $Q_3$ at $\frac{3N}{4}$). The Interquartile Range is $IQR = Q_3 - Q_1$.

Worked Examples

Example 1: Calculating Mean of Ungrouped Data (Based on NECTA 2018 Paper 1) The scores of 45 pupils in a Civics test were recorded as follows: 30 65 50 62 40 35 64 32 28 59 60 82 24 35 63 68 46 48 73 92 54 46 63 75 58 43 71 72 27 28 61 71 36 64 80 61 64 76 64 35 76 73 70 64 46 Calculate the mean score.

Step-by-Step Solution: For raw data, the mean is the sum of all values divided by the total number of items ($N = 45$). $$\bar{x} = \frac{\sum x}{N}$$ Let us sum the values directly: Sum $= 30+65+50+62+40+35+64+32+28+59+60+82+24+35+63+68+46+48+73+92+54+46+63+75+58+43+71+72+27+28+61+71+36+64+80+61+64+76+64+35+76+73+70+64+46$ $\sum x = 2534$ $$\bar{x} = \frac{2534}{45} = 56.31$$ Answer: The mean score is $56.31$.


Example 2: Mean and Median Difference (Based on NECTA 2021 Paper 1) The following are the marks obtained by 40 students in a mathematics examination:

48 47 57 56 71 62 46 45 50 76
58 66 48 32 89 60 42 47 54 67
64 49 37 64 67 44 45 45 42 34
47 44 73 44 58 43 54 35 54 52

Calculate the difference between the actual mean and the median of this distribution. Comment on the difference.

Step-by-Step Solution: Part 1: Finding the Mean $\sum x = 48+47+... (\text{sum of all } 40 \text{ items})$ Summing the grouped sets for speed:

  • 30s: $32+34+35+37 = 138$
  • 40s: $42(2)+43+44(3)+45(3)+46+47(3)+48(2)+49 = 726$
  • 50s: $50+52+54(3)+56+57+58(2) = 493$
  • 60s: $60+62+64(2)+66+67(2) = 450$
  • 70s: $71+73+76 = 220$
  • 80s: $89$
  • $\sum x = 138 + 726 + 493 + 450 + 220 + 89 = 2116$ $$\bar{x} = \frac{2116}{40} = 52.9$$

Part 2: Finding the Median Sort the 40 values in ascending order. The median will be the average of the $20^{th}$ and $21^{st}$ values. Sorted order (abbreviated): $32, 34, ..., 48, 48, \textbf{49}, \textbf{50}, 52, 54, ... 89$

  • The $20^{th}$ value is $49$.
  • The $21^{st}$ value is $50$.
  • $$\text{Median} = \frac{49 + 50}{2} = 49.5$$

Part 3: Difference and Comment $$\text{Difference} = \text{Mean} - \text{Median} = 52.9 - 49.5 = 3.4$$ Comment: The actual mean is greater than the median. This indicates that the distribution is positively skewed (skewed to the right). The few students who scored very high marks (like 89) pulled the arithmetic mean upward without affecting the central position (median).


Example 3: Median of a Grouped Distribution (Based on NECTA 2025 Paper 1) In a Biology examination done by 100 students, the marks were grouped as follows:

| Marks | Number of Students ($f$) | | :---: | :---: | | 65 - 69 | 10 | | 70 - 74 | 12 | | 75 - 79 | 55 | | 80 - 84 | 10 | | 85 - 89 | 5 | | 90 - 94 | 8 |

Find the median of the distribution to the nearest whole number.

Step-by-Step Solution: Step 1: Create a Cumulative Frequency ($CF$) column. | Class | Boundaries | $f$ | $CF$ | | :---: | :---: | :---: | :---: | | 65 - 69 | 64.5 - 69.5 | 10 | 10 | | 70 - 74 | 69.5 - 74.5 | 12 | 22 | | 75 - 79 | 74.5 - 79.5 | 55 | 77 | | 80 - 84 | 79.5 - 84.5 | 10 | 87 | | 85 - 89 | 84.5 - 89.5 | 5 | 92 | | 90 - 94 | 89.5 - 94.5 | 8 | 100 |

Step 2: Identify Median Class and Variables. Position of Median $= \frac{N}{2} = \frac{100}{2} = 50$. The $50^{th}$ value falls in the cumulative frequency interval $22$ to $77$, which corresponds to the 75 - 79 class.

  • Lower Boundary of median class ($L$) = $74.5$
  • Cumulative frequency before median class ($n_b$) = $22$
  • Frequency of median class ($n_w$) = $55$
  • Class size ($i$) = $79.5 - 74.5 = 5$

Step 3: Apply the formula. $$\text{Median} = L + \left( \frac{\frac{N}{2} - n_b}{n_w} \right) i$$ $$\text{Median} = 74.5 + \left( \frac{50 - 22}{55} \right) \times 5$$ $$\text{Median} = 74.5 + \left( \frac{28}{55} \right) \times 5 = 74.5 + 2.545 = 77.045$$ Answer: The median is $77$ (to the nearest whole number).


Example 4: Interpreting and Drawing a Pie Chart A farmer has 1800 animals on his farm: 600 cows, 800 goats, 300 sheep, and 100 donkeys. Calculate the sector angle for each animal and represent this data on a pie chart.

Step-by-Step Solution: Total frequency $\sum f = 1800$. Total degrees in a circle $= 360^\circ$. Formula: $\text{Angle} = \frac{f}{\sum f} \times 360^\circ$

  • Cows: $\frac{600}{1800} \times 360^\circ = 120^\circ$
  • Goats: $\frac{800}{1800} \times 360^\circ = 160^\circ$
  • Sheep: $\frac{300}{1800} \times 360^\circ = 60^\circ$
  • Donkeys: $\frac{100}{1800} \times 360^\circ = 20^\circ$

(Check: $120+160+60+20 = 360^\circ$) To construct: Draw a circle with a compass, draw a starting radius, and use a protractor to measure out consecutive angles, labeling each sector clearly with the animal name.


Example 5: Working with Histograms and Mode Using the frequency distribution from Example 3, calculate the exact Mode algebraically.

Step-by-Step Solution: The Modal Class is the class with the highest frequency. Looking at the table, the highest frequency is $55$, which corresponds to the $75 - 79$ class.

  • Lower Boundary of modal class ($L$) $= 74.5$
  • Difference in frequency with preceding class ($d_1$) $= 55 - 12 = 43$
  • Difference in frequency with succeeding class ($d_2$) $= 55 - 10 = 45$
  • Class size ($c$) $= 5$

Formula for Mode: $$\text{Mode} = L + \left( \frac{d_1}{d_1 + d_2} \right) c$$ $$\text{Mode} = 74.5 + \left( \frac{43}{43 + 45} \right) \times 5$$ $$\text{Mode} = 74.5 + \left( \frac{43}{88} \right) \times 5 = 74.5 + 2.443 = 76.943$$ Answer: The mode is approximately $76.94$. (If drawing a histogram, you would locate this value by criss-crossing lines in the highest bar on the x-axis).


Common Pitfalls & Misconceptions

  1. Plotting Histograms with Limits Instead of Boundaries:
  2. A widespread error is placing class limits (e.g., $65$ and $69$) on the x-axis of a histogram. This incorrectly results in gaps between bars. Always convert limits to class boundaries (e.g., $64.5$ and $69.5$) before drawing a histogram or ogive to reflect the continuous nature of the data.

  3. Finding the Median Class:
  4. Many students simply pick the middle row of the table to find the median class. The median class is strictly determined by the cumulative frequency reaching $\frac{N}{2}$. You must create the $CF$ column first.

  5. Misplotting the Ogive (Cumulative Frequency Curve):
  6. Students frequently plot the cumulative frequencies against the midpoint or lower boundary of the class. The rule is absolute: Cumulative frequency must be plotted against the Upper Class Boundary. (By definition, the cumulative total represents everything up to the end of that class).

  7. Incorrect Mean Calculation ($\sum x / n$ for Tables):
  8. When asked for the mean of grouped data, students sometimes add up the midpoints ($x$) and divide by the number of classes. This treats every class as having an equal weight of 1, ignoring the frequencies completely. You must multiply $f \times x$ and divide by $\sum f$.

  9. Rounding Prematurely:
  10. In formulas requiring division (like the median interpolation step), rounding values to zero decimal places midway through the calculation severely distorts the final answer. Keep at least 3 decimal places until the very final step.


NECTA Exam Focus

Based on the analysis of past CSEE papers from 2018 to 2025, the Statistics topic is highly predictable and carries significant weight.

  • Section B Supremacy (10 Marks): Almost every year, Statistics features as a mandatory 10-mark multi-part question in Section B. You are typically presented with raw data or an incomplete grouped distribution table.
  • The "Holy Trinity" of Graphs: You will inevitably be asked to draw one of three graphs: a Histogram, a Frequency Polygon, or a Cumulative Frequency Curve (Ogive).
  • Interpreting the Graph: NECTA strongly prefers testing your ability to extract values from your graphs. For instance, instead of merely asking you to compute the Mode, they will ask you to "Estimate the Mode from your Histogram." Similarly, they will ask you to find the Median and Interquartile Range directly from your Ogive.
  • Analytical Reasoning: Recent exams (like the 2021 Paper) show a trend of asking students to compute multiple measures (Mean and Median) and "comment on the difference." This tests your understanding of data skewness—knowing that extreme values drag the mean away from the median.
  • Raw Data Fundamentals: Section A usually contains a smaller, 3-to-4 mark question on calculating the mean of ungrouped data or sketching a basic pie chart.

Practice Problems

Basic Difficulty

  1. The mass of 10 students (in kg) is recorded as: $45, 50, 48, 52, 45, 60, 47, 45, 55, 53$. Find the mean, median, and mode of this data.
  2. A family’s monthly income of $360,000$ TZS is budgeted as follows: Food $120,000$ TZS, Rent $90,000$ TZS, Education $60,000$ TZS, Savings $40,000$ TZS, Miscellaneous $50,000$ TZS. Calculate the sector angles and draw a pie chart to represent this data.
  3. State the class boundaries, class mark, and class size for the interval $41 - 50$.

Intermediate Difficulty

  1. (Past Paper Style) The following are marks obtained by 40 students in a Mathematics examination:
  2. 48 47 57 56 71 62 46 45 50 76 58 66 48 32 89 60 42 47 54 67 64 49 37 64 67 44 45 45 42 34 47 44 73 44 58 43 54 35 54 52 Prepare a grouped frequency distribution table with class intervals $30 - 39$, $40 - 49$, etc.

  3. Using the table prepared in question 4, draw a histogram and use it to estimate the mode of the students' marks.
  4. A frequency polygon is drawn using the midpoints $12, 17, 22, 27, 32$. If the class widths are equal, determine the exact class limits for all five classes.

Advanced Difficulty

  1. (NECTA 2025 Paper 1 Context) In a Biology examination done by 100 students, the marks were grouped into classes: $65-69$ ($f=10$), $70-74$ ($f=12$), $75-79$ ($f=55$), $80-84$ ($f=10$), $85-89$ ($f=5$), $90-94$ ($f=8$).
  2. (a) Calculate the Mean score using an assumed mean of $77$. (b) Calculate the Mode algebraically.

  3. Using the data from Question 7, construct a cumulative frequency distribution. Draw the Ogive (Cumulative Frequency Curve) on graph paper. From your curve, determine:
  4. (a) The Median mark. (b) The Lower and Upper Quartiles. (c) The number of students who scored strictly more than $82$ marks.

Subtopics

  • Pictograms
  • Bar charts
  • Line graphs
  • Pie chart
  • Frequency distribution tables
  • Frequency polygons
  • Histograms
  • Cumulative frequency curves

Crosswalk Notes

Cross-version relationships are drafted in data/curricula/crosswalks/csee-basic-mathematics-2005-to-mathematics-2023.json. Partial and 2005-only mappings remain reviewable.

+ Related Pages

Syllabus Sequence

Sibling Topics

Curriculum And Sources