AQA A-Level Computer Science

4.1 — Paper 1

Fundamentals of Programming

Data types · Control structures · Subroutines · Stack frames · Recursion · OOP · IDEs

What is a program?

A program is a set of instructions that a computer follows to perform a task. Every program is written in a programming language — a formal notation with precise rules. The computer doesn't understand English or intentions; it only executes exact instructions.

At A-level you'll work with an imperative high-level language (Python, Java, C#, or VB.Net). Imperative means you write instructions that tell the computer how to do something, step by step.

4.1.1.1 Data types

A data type tells the computer what kind of value a variable holds, which determines how much memory to allocate and what operations are valid. Using the wrong type causes errors — you can't multiply text, for example.

Type	What it stores	Example values	Notes
Integer	Whole numbers (no decimal)	`-7, 0, 42, 1000`	Fast arithmetic. Can't store 3.5.
Real / float	Numbers with a decimal part	`3.14, -0.5, 2.0`	Stored approximately — tiny rounding errors can occur.
Boolean	True or False only	`True, False`	Used in conditions and flags. Stored as 1 bit logically.
Character	A single symbol	`'A', '7', '!'`	Stored as a Unicode/ASCII code number.
String	A sequence of characters	`"Hello", "CS"`	Immutable in most languages. Has a length.
Date/time	Dates and times	`2025-09-01`	Often stored internally as a number (seconds since epoch).
Pointer/reference	A memory address of another object	`0x7FFF...`	A-level only. Variables declared as pointer type store where an object lives in memory.
Record	A collection of fields of different types	Name + Age + Score	Like a row in a spreadsheet. User-defined.
Array	Ordered, fixed-size collection of same type	`[1, 4, 9, 16]`	Indexed. 2D array = grid/matrix.

User-defined data types

You can build your own types from the built-in ones. For example, define a Student record type with fields: Name (String), Age (Integer), Score (Real). This is user-defined because you chose the structure — the language didn't provide it built-in.

Pointer types — what they actually are

When you create an object in a program (e.g. new Dog() in Java), the object is placed in memory at some address. A pointer or reference variable doesn't hold the Dog — it holds the address of where the Dog lives. This lets you pass large objects around efficiently without copying them.

4.1.1.2 Programming concepts — the three building blocks

All imperative programs are built from three and only three structural concepts. Every program ever written in Python, Java, or C is just these three, combined:

SequenceInstructions run one after another, top to bottom. The default behaviour. x = 5 then y = x + 1 — the assignment happens before the addition.

SelectionThe program chooses which block to execute based on a condition. IF...THEN...ELSE, CASE/SWITCH. Without selection, a program does the same thing every time regardless of input.

IterationA block of code is repeated. FOR loops, WHILE loops, REPEAT-UNTIL. Without iteration, you'd have to write the same instruction hundreds of times.

Definite vs indefinite iteration — explained▶

Definite iteration (FOR loop): you know exactly how many times the loop will run before it starts.

FOR i ← 1 TO 10
    OUTPUT i
ENDFOR

This always runs exactly 10 times. The count is fixed at the start.

Indefinite iteration: you don't know in advance — the loop keeps going until a condition changes. Two variants:

WHILE loop — condition checked before each iteration. If the condition is false from the start, the loop body never runs at all.

WHILE password ≠ "correct"
    password ← INPUT("Try again: ")
ENDWHILE

REPEAT-UNTIL loop — condition checked after each iteration. The body always runs at least once, even if the condition is already met.

REPEAT
    password ← INPUT("Enter password: ")
UNTIL password = "correct"

Exam trap — which loop to use?

If asked to guarantee the code runs at least once (e.g. showing a menu), use REPEAT-UNTIL. If the code might need to not run at all (e.g. skipping a process when a list is empty), use WHILE. Examiners specifically test this distinction.

4.1.1.3–5 Operators

Arithmetic operators — all of them with examples▶

Operation	Symbol	Example	Result
Addition	+	5 + 3	8
Subtraction	-	10 - 4	6
Multiplication	*	6 * 7	42
Real division	/	7 / 2	3.5
Integer division (DIV)	DIV or //	7 DIV 2	3 (whole part only)
Modulo / remainder (MOD)	MOD or %	7 MOD 2	1 (the leftover)
Exponentiation	^ or **	2^10	1024
Rounding	ROUND(x,n)	ROUND(3.7)	4
Truncation	INT(x) or TRUNC(x)	INT(3.9)	3 (just chops off)

Why MOD is useful

Check if a number is even: n MOD 2 = 0. Find last digit: n MOD 10. Wrap around (circular queue): rear ← (rear + 1) MOD capacity.

Rounding vs truncation

ROUND(3.9) = 4 (rounds to nearest). INT(3.9) = 3 (just removes the decimal — always towards zero). INT(-3.9) = -3, not -4. Exams sometimes test this on negative numbers.

Relational and Boolean operators▶

Relational operators compare two values and return True or False:

Operator	Meaning	Example returning True
=	Equal to	5 = 5
≠ or !=	Not equal to	5 ≠ 3
<	Less than	3 < 7
>	Greater than	9 > 2
≤	Less than or equal	5 ≤ 5
≥	Greater than or equal	7 ≥ 7

Boolean operators combine Boolean values:

Operator	Result True when…	Example
NOT	The operand is False	NOT False = True
AND	Both operands are True	True AND True = True
OR	At least one operand is True	True OR False = True
XOR	Exactly one operand is True (exclusive)	True XOR False = True; True XOR True = False

4.1.1.6 Constants and variables

A variable is a named location in memory. The program can read its value and change it any time. Think of it as a labelled box you can put different things in.

A constant is also a named location, but its value is fixed at the start and cannot change during execution.

Why use constants?

Suppose your program uses the value 3.14159 for π in twenty different calculations. If you hardcode 3.14159 everywhere and later want more precision, you have to change it in twenty places. If you declare CONST PI ← 3.14159265 and use PI everywhere, you change it in one place. Also, PI is meaningful; 3.14159 in the middle of code is a "magic number" with no obvious meaning.

4.1.1.7 String handling

Operation	What it does	Example
Length	Number of characters	LEN("Hello") = 5
Position (find)	Index of a character	POSITION("Hello", "l") = 3
Substring	Extract part of a string	SUBSTRING("Hello", 2, 3) = "ell"
Concatenation	Join two strings	"Hello" + " World" = "Hello World"
Char to char code	ASCII/Unicode number of character	ASC("A") = 65
Char code to char	Character from code number	CHR(65) = "A"
String → Integer	Convert "42" to the number 42	INT("42") = 42
Integer → String	Convert 42 to the text "42"	STR(42) = "42"
String → Float	Convert "3.14" to 3.14	FLOAT("3.14") = 3.14

Why type conversion matters

If a user types their age, the input arrives as a string. You can't do maths on a string — you must convert it to an integer first with INT() or equivalent. Forgetting this is a common programming error.

4.1.1.8 Random number generation

Programs can generate pseudo-random numbers using RANDOM(min, max) or equivalent. Used in simulations, games, shuffling, and security. They're called pseudo-random because they're deterministic — generated by a mathematical formula starting from a seed value — but appear random in practice.

4.1.1.9 Exception handling

When something goes wrong at runtime (dividing by zero, trying to open a file that doesn't exist, converting "abc" to an integer), the program normally crashes. Exception handling lets you catch these errors and respond gracefully instead.

TRY
    result ← 10 / userInput
    OUTPUT result
EXCEPT
    OUTPUT "Error: cannot divide by zero"
ENDTRY

The TRY block contains the code that might fail. If an error occurs, execution jumps to EXCEPT where you handle it. The program continues rather than crashing.

4.1.1.10–14 Subroutines, parameters, and scope

Subroutines — what they are and why they exist▶

A subroutine is a named, reusable block of code. You define it once and call it by name whenever you need it. Without subroutines, you'd copy-paste code everywhere — if you found a bug, you'd have to fix it in every copy.

Benefits of subroutines:

Avoid repetition — write once, use many times
Easier to read — validateEmail(email) is clearer than 20 lines of logic inline
Easier to test — test each subroutine independently
Easier to maintain — fix a bug in one place, fixed everywhere
Enables teamwork — different people can write different subroutines

A procedure is a subroutine that performs an action but doesn't give back a value.

PROCEDURE greet(name)
    OUTPUT "Hello, " + name
ENDPROCEDURE

CALL greet("Alice")   → outputs: Hello, Alice

A function is a subroutine that computes and returns a value to where it was called.

FUNCTION square(n)
    RETURN n * n
ENDFUNCTION

result ← square(5)    → result = 25

Parameters — how data gets into subroutines▶

Parameters are variables listed in the subroutine definition. When you call the subroutine, you pass arguments (actual values) that fill those parameter slots.

FUNCTION add(a, b)      ← a and b are parameters
    RETURN a + b
ENDFUNCTION

result ← add(3, 7)      ← 3 and 7 are arguments

Parameters make subroutines flexible — the same add function works with any numbers you give it.

Local vs global variables — scope▶

Scope describes where in a program a variable is accessible.

A local variable is declared inside a subroutine. It:

Only exists while the subroutine is running
Is created fresh each time the subroutine is called
Cannot be accessed from outside the subroutine
Is destroyed when the subroutine returns

A global variable is declared outside all subroutines. It:

Exists for the entire program's lifetime
Is accessible from anywhere in the program
Can be accidentally modified by any part of the code

Why local variables are good practice

Using local variables prevents "side effects" — unexpected changes to a variable from a distant part of the code. They make code easier to reason about: you know a subroutine can only affect things within itself. Global variables shared across a large program are a major source of bugs.

Exam trap

If two subroutines each declare a local variable called count, these are completely separate variables that happen to share a name. Changing one does not affect the other.

4.1.1.15 Stack frames — how subroutine calls actually work

The call stack and stack frames▶

Every time a subroutine is called, the computer needs to remember: where to return to when done, what the parameter values are, and what local variables it creates. This information is stored in a stack frame (also called an activation record) pushed onto the call stack.

A stack frame stores:

Return address — the memory address of the instruction to go back to after the subroutine finishes
Parameters — the values passed to the subroutine
Local variables — variables created inside the subroutine

What happens when you call a function

Current position is saved as the return address
Parameters and local variable space pushed onto the call stack as a new frame
Subroutine executes using its frame's local variables
When RETURN is reached, the frame is popped off the stack
Execution resumes at the saved return address

Stack overflow

If a recursive function has no base case (or the base case is never reached), it keeps pushing frames onto the stack without ever popping them off. Eventually the stack runs out of memory — this is a stack overflow error.

4.1.1.16 Recursion

Recursion — definition, examples, and mechanics▶

A recursive subroutine is one that calls itself. Every recursive solution must have:

A base case — the condition that stops the recursion and returns a value directly (no further self-call)
A recursive case — the part that calls itself with a simpler/smaller version of the problem

Example: factorial

FUNCTION factorial(n)
    IF n = 0 THEN
        RETURN 1             ← base case
    ELSE
        RETURN n * factorial(n - 1)   ← recursive case
    ENDIF
ENDFUNCTION

Tracing factorial(4):

factorial(4) = 4 × factorial(3)
factorial(3) = 3 × factorial(2)
factorial(2) = 2 × factorial(1)
factorial(1) = 1 × factorial(0)
factorial(0) = 1 ← base case hit! Unwind:
= 1 × 1 = 1, then 2 × 1 = 2, then 3 × 2 = 6, then 4 × 6 = 24

Recursion advantages

Elegant, readable code for naturally recursive problems
Matches mathematical definitions directly
Some problems (tree traversal, backtracking) are much simpler recursively

Recursion disadvantages

Each call uses stack memory — risk of stack overflow
Slower than iteration due to function call overhead
Can be hard to trace and debug

4.1.2 Programming paradigms

Procedural programming

Code is organised as a sequence of procedure calls. The program has a clear top-down structure. You use hierarchy charts to plan modules. Well-structured procedural code is easy to read, test, and maintain.

Hierarchy charts show the decomposition of a program into modules. The top box is the whole program; boxes below it are sub-modules it calls; and so on. They don't show logic or order — just the structure of what calls what.

4.1.2.3 Object-oriented programming (OOP)

Classes, objects, and instantiation▶

OOP models the world as a collection of objects that interact. Each object has its own data (state) and behaviour (methods). This mirrors how we think about real-world things.

A class is the blueprint or template. It defines what attributes (data fields) and methods (functions) objects of that type will have. The class is not an object itself — it's the design.

An object is an instance — a concrete thing created from the class at runtime. Creating an object is called instantiation.

CLASS Dog
    PRIVATE name : String
    PRIVATE breed : String
    
    PUBLIC PROCEDURE new(n, b)   ← constructor
        name ← n
        breed ← b
    ENDPROCEDURE
    
    PUBLIC FUNCTION getName()
        RETURN name
    ENDFUNCTION
    
    PUBLIC PROCEDURE bark()
        OUTPUT name + " says: Woof!"
    ENDPROCEDURE
ENDCLASS

myDog ← new Dog("Rex", "Labrador")   ← instantiation
myDog.bark()   → Rex says: Woof!

The constructor is a special method called automatically when an object is created. It initialises the object's attributes.

Encapsulation▶

Encapsulation means bundling the data (attributes) and the methods that operate on it together inside the class, and hiding internal implementation details from the outside world.

Access modifiers control visibility:

Private (−) — only accessible inside the class itself. Attributes are usually private.
Public (+) — accessible from anywhere. Methods that other code needs to call are public.
Protected (#) — accessible within the class and subclasses, but not from outside.

Why private attributes?

If age is public, any code anywhere could set myPerson.age = -500. If it's private and only changeable through a setAge(n) method, you can validate: IF n > 0 THEN age ← n. The class controls its own data integrity.

A getter (accessor) method returns the value of a private attribute. A setter (mutator) method allows controlled modification. This is the principle of information hiding.

Inheritance▶

Inheritance allows one class (the subclass or derived class) to inherit all the attributes and methods of another class (the superclass or base class). The subclass gets everything from the superclass for free and can add its own extra attributes/methods or override inherited ones.

CLASS Animal
    PRIVATE name : String
    PUBLIC PROCEDURE makeSound()
        OUTPUT "..."
    ENDPROCEDURE
ENDCLASS

CLASS Cat INHERITS Animal
    PUBLIC PROCEDURE makeSound()   ← overrides Animal's version
        OUTPUT name + " says: Meow!"
    ENDPROCEDURE
    PUBLIC PROCEDURE purr()        ← new method only Cat has
        OUTPUT "Purrrr"
    ENDPROCEDURE
ENDCLASS

The "is-a" relationship

Inheritance should model an "is-a" relationship. A Cat IS AN Animal — inheritance is correct. A Car HAS AN Engine — that's not inheritance, that's composition (an attribute). Getting this wrong is a major OOP design error.

Polymorphism, aggregation, and composition▶

Polymorphism means "many forms". When different classes share a method name but implement it differently, a call to that method on different objects produces different behaviour.

Example: both Cat and Dog inherit from Animal and both have makeSound(). Calling makeSound() on a Dog produces "Woof", on a Cat produces "Meow". The caller doesn't need to know which type it's dealing with — it just calls makeSound().

Overriding is when a subclass provides its own version of a method that exists in the superclass, replacing the superclass version for that class.

Aggregation — an object contains references to other objects, but those objects can exist independently. In UML: white diamond. Example: a Department has many Employees, but Employees exist even if the Department is deleted.

Composition — an object contains other objects that cannot exist without it. In UML: black diamond. Example: a House has Rooms. If the House is destroyed, the Rooms cease to exist too.

4.1.5 Integrated Development Environments (IDEs)

An IDE is a software application that combines all the tools you need to write, run, and debug code:

EditorWhere you type code. Has syntax highlighting (colours keywords), auto-indentation, and auto-completion (suggests code as you type).

Run / executeRuns the program without leaving the IDE. Shows output in an integrated console.

DebuggerLets you pause execution at a breakpoint, then step through code line by line. You can inspect variable values at each step — essential for finding logic errors.

Error diagnosticsHighlights syntax errors as you type (or when you run). Shows which line caused a runtime error and what went wrong.

4.2 — Paper 1

Fundamentals of Data Structures

Arrays · Queues · Stacks · Linked lists · Trees · Hash tables · Graphs · Dictionaries · Vectors

What is a data structure?

A data structure is a way of organising and storing data in memory so that it can be accessed and modified efficiently. Different structures suit different problems — a list is good for ordered data, a hash table for fast lookup, a graph for networks and relationships.

An Abstract Data Type (ADT) defines a data structure by what operations it supports and what those operations do — not how they are implemented in code. Think of it as a contract: "this structure lets you push, pop, and peek, with these behaviours." The implementation details are hidden.

Static vs dynamic data structures▶

A static data structure has a fixed size set at creation time. It cannot grow or shrink. Arrays are static in most languages.

A dynamic data structure can grow and shrink during execution by allocating and freeing memory as needed. Linked lists, trees, and graphs are dynamic.

	Static	Dynamic
Size	Fixed at creation	Changes at runtime
Memory	Allocated once, may waste space	Only uses what it needs
Speed	Generally faster (no allocation overhead)	Slightly slower (allocation/deallocation)
Example	Array	Linked list, tree

4.2.1.2 Arrays

An array is a static, fixed-size, ordered collection of elements of the same type, accessed by an integer index.

1D and 2D arrays in detail▶

A 1D array is a list. Imagine a row of numbered boxes:

scores ← [85, 72, 90, 68, 95]
         index: 0   1   2   3   4

scores[2] = 90   ← access element at index 2
scores[0] = 85   ← first element is index 0

A 2D array is a grid (rows and columns), accessed by two indices [row][column]:

grid ← [[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]]

grid[1][2] = 6   ← row 1, column 2

2D arrays represent matrices, game boards, images (pixels), and spreadsheets. The spec says a 2D array is a way to represent a matrix.

Zero-indexing vs one-indexing

Most languages (Python, Java, C) start indices at 0. Some pseudocode conventions start at 1. Always check what convention is being used in an exam question.

4.2.2 Queues

A queue is a FIFO (First In, First Out) data structure. Like a real queue at a shop — the first person to join is the first person served. New items join at the back (rear); items leave from the front.

Queue operations▶

Operation	What it does
Enqueue	Add an item to the back of the queue
Dequeue	Remove and return the item at the front
Peek / Front	Look at the front item without removing it
isEmpty	Returns True if the queue has no items
isFull	Returns True if the queue is at capacity (for fixed-size implementations)

Linear queue vs circular queue▶

Linear queue implemented with an array: maintain two pointers — front (index of first item) and rear (index of last item). When you dequeue, front moves right. Problem: even when items are removed, the space at the front of the array can never be reused — you run out of space even when there's conceptually room.

Circular queue solves this: the array "wraps around" — when rear reaches the end, it goes back to index 0 if that space has been freed. Use modulo arithmetic:

rear ← (rear + 1) MOD capacity

This reuses space that was freed by dequeuing from the front. The queue is full when (rear + 1) MOD capacity = front.

Priority queue▶

Items are assigned a priority when added. The item dequeued is not necessarily the one at the "front" — it's the one with the highest priority. Equal-priority items are dequeued in FIFO order.

Uses

Operating system task scheduling (urgent processes get CPU time first), hospital emergency triage, Dijkstra's algorithm (process lowest-distance node first).

4.2.3 Stacks

A stack is a LIFO (Last In, First Out) data structure. Like a stack of plates — you add to the top and always remove from the top. You can only access the topmost item.

Stack operations and implementation▶

Operation	What it does
Push	Add an item to the top of the stack. Increment the top pointer.
Pop	Remove and return the item at the top. Decrement the top pointer.
Peek / Top	Return the top item's value without removing it.
isEmpty	Returns True if top pointer = -1 (or 0 if 1-indexed)
isFull	Returns True if top pointer = array capacity - 1

Implementation: an array plus an integer topPointer starting at -1 (empty). Push increments it and stores the value; Pop reads the value and decrements it.

stack ← [_, _, _, _, _]   (capacity 5, all empty)
topPointer ← -1

Push(10): stack[0] ← 10, topPointer ← 0
Push(20): stack[1] ← 20, topPointer ← 1
Push(30): stack[2] ← 30, topPointer ← 2
Pop():    returns stack[2] = 30, topPointer ← 1

Crucial uses of stacks

Call stack — subroutine calls and returns (as explained in Topic 1)
Undo functionality — each action pushed; undo pops the last one
Expression evaluation — evaluating Reverse Polish Notation
Depth-first search — maintaining which nodes to visit next
Backtracking algorithms — e.g. maze solving

4.2.4 Graphs

A graph is a collection of vertices (nodes) connected by edges (arcs). Graphs model anything with pairwise relationships: road networks, social connections, the internet, dependencies.

Graph types and terminology▶

Term	Meaning
Vertex / node	A point in the graph. Represents an entity (city, person, router).
Edge / arc	A connection between two vertices. Represents a relationship or path.
Undirected graph	Edges have no direction — if A connects to B, then B connects to A. (Bidirectional.)
Directed graph (digraph)	Edges have direction — A→B does NOT imply B→A. Like one-way streets.
Weighted graph	Each edge has a numerical value (weight/cost) — distance, time, bandwidth.

Adjacency matrix vs adjacency list▶

Adjacency matrix: a 2D array where row i, column j = 1 (or the weight) if there's an edge from vertex i to vertex j, otherwise 0.

     A  B  C  D
A  [ 0, 1, 0, 1 ]
B  [ 1, 0, 1, 0 ]
C  [ 0, 1, 0, 1 ]
D  [ 1, 0, 1, 0 ]

Adjacency list: each vertex has a list of its neighbours.

A: [B, D]
B: [A, C]
C: [B, D]
D: [A, C]

	Adjacency matrix	Adjacency list
Space (n vertices, e edges)	O(n²) — always	O(n + e) — efficient for sparse
Check if edge exists	O(1) — direct lookup	O(degree) — scan the list
Best for	Dense graphs (many edges)	Sparse graphs (few edges)
Adding a vertex	Expensive — resize matrix	Easy — add a new list entry

4.2.5 Trees

A tree is a connected, undirected graph with no cycles. Key point: there is exactly one path between any two nodes. Trees don't have to have a root — but in computing we usually work with rooted trees.

Rooted trees and binary trees▶

A rooted tree has one designated root node (the top). Every other node has exactly one parent and zero or more children. Nodes with no children are leaves.

A binary tree is a rooted tree where each node has at most two children, called the left child and the right child.

A binary search tree (BST) is a binary tree where for every node: all values in the left subtree are less than the node's value, and all values in the right subtree are greater. This ordering enables efficient search, insertion, and deletion.

        50
       /  \
      30    70
     / \   / \
    20  40 60  80

To search for 60: start at 50 → 60 > 50, go right to 70 → 60 < 70, go left to 60 → found. Only 3 comparisons for an 8-node tree.

Uses of trees

File systems (folders contain files/subfolders), HTML/XML DOM, expression parsing, Huffman compression, decision trees in AI, game trees.

4.2.6 Hash tables

Hash tables — the fastest data structure▶

A hash table maps keys to values using a hash function. The hash function takes a key and computes an index into an array. This gives O(1) average lookup — you jump directly to where the value should be.

Example hash function: index = key MOD tableSize

Table size = 7
Key "Alice" → hash → 3
Key "Bob"   → hash → 5
Key "Carol" → hash → 1

table[3] = "Alice's data"
table[5] = "Bob's data"
table[1] = "Carol's data"

Collision: when two different keys hash to the same index. This will happen. Two resolution strategies:

1. Chaining: each slot in the table is a linked list. Colliding entries are added to the list at that slot. Lookup: hash the key, go to slot, search the (hopefully short) list.

2. Open addressing / rehashing: if a slot is occupied, probe the next slot (or next+1, or next+n²) until an empty slot is found. Linear probing: try slot, slot+1, slot+2, etc.

What makes a good hash function?

It should distribute keys evenly across all slots (minimising collisions), be fast to compute, and be deterministic (same input always gives same output).

4.2.7 Dictionaries

A dictionary (also called an associative array or map) is a collection of key-value pairs where each key is unique. You look up values by key, not by position.

# Python example
student = {"name": "Alice", "age": 17, "grade": "A"}
student["age"]   → 17
student["grade"] → "A"

Dictionaries are typically implemented using hash tables. The spec mentions a specific use: information retrieval — counting word frequencies in a document by using the word as a key and count as the value.

4.2.8 Vectors

Vectors — mathematical and computational▶

A vector is an ordered list of numbers, all drawn from the same field (usually ℝ, the real numbers). A vector with n components is called an n-vector over ℝ, written ℝⁿ.

Example: the 4-vector [2.0, 3.14, -1.0, 2.72]

Representations:

As a list: [2.0, 3.14, -1.0, 2.72]
As a 1D array: Dim v(3) As Single in VB.Net
As a dictionary (function interpretation): {0: 2.0, 1: 3.14, 2: -1.0, 3: 2.72} — maps index to value
Geometrically: as an arrow from the origin to the point (x, y) in 2D space

Vector operations:

Operation	Definition	Effect
Addition	[a,b] + [c,d] = [a+c, b+d]	Translation (shift in space)
Scalar multiplication	k × [a,b] = [ka, kb]	Scaling (stretch/shrink)
Dot (scalar) product	u·v = u₁v₁ + u₂v₂ + … + uₙvₙ	Returns a single number; used to find angle between vectors

Convex combination of vectors u and v: αu + βv where α ≥ 0, β ≥ 0, and α + β = 1. This describes all points on the line segment between u and v.

Dot product application

The angle θ between two vectors satisfies: cos θ = (u·v) / (|u| × |v|). If the dot product is 0, the vectors are perpendicular. Used in graphics, physics simulations, and machine learning.

4.3 — Paper 1

Fundamentals of Algorithms

Graph traversal · Tree traversal · Reverse Polish · Searching · Sorting · Dijkstra · Complexity

What is an algorithm?

An algorithm is a finite sequence of precise, unambiguous steps that solves a problem and always terminates. Note: it must always terminate — an infinite loop is not an algorithm. Algorithms are the core of computer science; a program is just an algorithm implemented in code.

4.3.1 Graph traversal

Graph traversal means visiting every vertex in a graph systematically. The order matters. Two fundamental strategies:

Breadth-first search (BFS) — full trace▶

BFS explores all neighbours of the current vertex before moving deeper. It processes vertices level by level. Uses a queue.

Algorithm:

Create an empty queue and a "visited" set
Add the start vertex to the queue and mark it visited
While queue is not empty:
a. Dequeue the front vertex, process it
b. For each unvisited neighbour: mark visited, enqueue it

Trace on a graph: A-B, A-C, B-D, B-E, C-F

Start at A. Queue: [A]. Visited: {A}

Dequeue A → process. Enqueue B, C. Queue: [B, C]. Visited: {A,B,C}

Dequeue B → process. Enqueue D, E. Queue: [C, D, E]. Visited: {A,B,C,D,E}

Dequeue C → process. Enqueue F. Queue: [D, E, F]. Visited: {A,B,C,D,E,F}

Dequeue D, E, F → process each. Queue empty. Done.

Order visited: A, B, C, D, E, F (level by level)

Use: shortest path in unweighted graphs

BFS finds the shortest path (fewest edges) between two vertices. Since it expands outward level by level, the first time it reaches the destination is guaranteed to be via the shortest route. Used in GPS navigation on simple maps, social network degree of separation.

Depth-first search (DFS) — full trace▶

DFS explores as far as possible along one branch before backtracking to try other branches. Uses a stack (or recursion).

Algorithm:

Create an empty stack and a "visited" set
Push the start vertex. Mark visited.
While stack is not empty:
a. Pop the top vertex, process it
b. For each unvisited neighbour: mark visited, push it

Trace on same graph

Start at A. Stack: [A]. Visited: {A}

Pop A → process. Push C, B (order matters). Stack: [C, B]. Visited: {A,B,C}

Pop B → process. Push E, D. Stack: [C, E, D]. Visited: {A,B,C,D,E}

Pop D → process. No unvisited neighbours. Stack: [C, E]

Pop E → process. Stack: [C]

Pop C → process. Push F. Stack: [F]

Pop F → process. Stack empty. Done.

Order: A, B, D, E, C, F (goes deep before backtracking)

Uses of DFS

Maze solving (explore one path fully; backtrack if stuck), topological sorting, detecting cycles in a graph, finding connected components, and it's the basis for many backtracking algorithms.

4.3.2 Tree traversal

Tree traversal visits every node in a tree. For a binary tree, three standard orderings exist, all defined recursively:

Traversal	Order	Use case
Pre-order	Root → Left subtree → Right subtree	Copying a tree; prefix expression evaluation
In-order	Left subtree → Root → Right subtree	Outputs BST contents in sorted ascending order
Post-order	Left subtree → Right subtree → Root	Deleting a tree; infix → Reverse Polish; expression trees

Trace on this BST

        50
       /  \
      30    70
     / \   / \
    20  40 60  80

In-order: 20, 30, 40, 50, 60, 70, 80 — sorted order, confirming it's a valid BST.

Pre-order: 50, 30, 20, 40, 70, 60, 80 — root first, then left, then right.

Post-order: 20, 40, 30, 60, 80, 70, 50 — both subtrees before root.

4.3.3 Reverse Polish Notation (RPN)

What RPN is and why it exists▶

Standard mathematical notation (infix) puts the operator between operands: 3 + 4. This requires brackets to show order: (3 + 4) × 5 vs 3 + (4 × 5).

Reverse Polish Notation (RPN) / postfix puts the operator after its operands. Crucially, brackets are never needed — the order of operations is unambiguous from the notation itself.

Infix	RPN (postfix)
3 + 4	3 4 +
(3 + 4) × 5	3 4 + 5 ×
3 + (4 × 5)	3 4 5 × +
(A + B) × (C − D)	A B + C D − ×

To convert infix to RPN (Shunting-yard algorithm): use a stack for operators. Push operands directly to output. For operators, pop operators of higher/equal precedence from stack to output first, then push the new one. At end, pop remaining operators to output.

To evaluate RPN using a stack:

Read tokens left to right
If a number: push to stack
If an operator: pop two values, apply the operator, push the result
The final stack value is the answer

Evaluating 3 4 + 5 ×

Read 3 → push. Stack: [3]

Read 4 → push. Stack: [3, 4]

Read + → pop 4 and 3, compute 3+4=7, push 7. Stack: [7]

Read 5 → push. Stack: [7, 5]

Read × → pop 5 and 7, compute 7×5=35, push 35. Stack: [35]

End. Result = 35 ✓

Where RPN is actually used

Compilers and interpreters use RPN (or a similar form) internally because it's simple to evaluate with a stack. PostScript (the language behind PDF) is stack-based. Old HP calculators used RPN. JVM bytecode is a form of stack-based evaluation.

4.3.4 Searching algorithms

Linear search — O(n)▶

Check each element in turn from the start until the target is found or the list ends.

FOR i ← 0 TO length(list) - 1
    IF list[i] = target THEN
        RETURN i
    ENDIF
ENDFOR
RETURN -1   ← not found

Works on: any list, sorted or unsorted. Time complexity: O(n) — in the worst case you check every element. Simple but slow for large lists.

Binary search — O(log n)▶

Requires a sorted list. Repeatedly halve the search space by comparing the target to the middle element.

low ← 0
high ← length(list) - 1
WHILE low ≤ high
    mid ← (low + high) DIV 2
    IF list[mid] = target THEN
        RETURN mid
    ELSE IF list[mid] < target THEN
        low ← mid + 1    ← target must be in upper half
    ELSE
        high ← mid - 1   ← target must be in lower half
    ENDIF
ENDWHILE
RETURN -1

Trace: searching for 60 in [10, 20, 30, 40, 50, 60, 70, 80]

low=0, high=7. mid=3. list[3]=40 < 60. low ← 4.

low=4, high=7. mid=5. list[5]=60 = target. Return 5. Found in 2 steps!

Time complexity: O(log n). Each step halves the remaining items. For 1 billion items, worst case = only 30 comparisons (log₂(10⁹) ≈ 30).

Binary tree search

Searching a BST follows the same logic: if target < current node, go left; if target > current node, go right. This is O(log n) for a balanced tree, but degrades to O(n) for a degenerate (stick-shaped) tree.

4.3.5 Sorting algorithms

Bubble sort — O(n²)▶

Repeatedly step through the list, comparing adjacent pairs and swapping them if out of order. After each pass, the largest unsorted element "bubbles" to its correct position.

FOR passNum ← 1 TO length - 1
    swapped ← False
    FOR i ← 0 TO length - 1 - passNum
        IF list[i] > list[i+1] THEN
            SWAP list[i], list[i+1]
            swapped ← True
        ENDIF
    ENDFOR
    IF NOT swapped THEN RETURN   ← early exit: already sorted
ENDFOR

Trace: sorting [5, 3, 8, 1, 4]

Pass 1: [3,5,8,1,4] → [3,5,1,8,4] → [3,5,1,4,8]

Pass 2: [3,1,5,4,8] → [3,1,4,5,8]

Pass 3: [1,3,4,5,8] ← done (no swaps)

Time complexity: O(n²) worst/average case. O(n) best case (already sorted, with early exit flag). Included in the spec as an example of an inefficient sort. Fine for small lists or nearly-sorted data.

Merge sort — O(n log n)▶

Divide-and-conquer approach. Recursively split the list in half until you have single elements (trivially sorted), then merge pairs of sorted sublists back together.

FUNCTION mergeSort(list)
    IF length(list) ≤ 1 THEN
        RETURN list   ← base case: single element is sorted
    ENDIF
    mid ← length(list) DIV 2
    left ← mergeSort(list[0..mid-1])   ← sort left half
    right ← mergeSort(list[mid..end])  ← sort right half
    RETURN merge(left, right)           ← merge sorted halves
ENDFUNCTION

The merge step: compare the front elements of both sorted halves, take the smaller one, advance that pointer. Repeat until both halves are exhausted.

Trace: [5, 3, 8, 1]

Split: [5,3] and [8,1]

Split: [5] and [3] → merge: [3,5] | [8] and [1] → merge: [1,8]

Merge [3,5] and [1,8]: compare 3 vs 1 → take 1. Compare 3 vs 8 → take 3. Compare 5 vs 8 → take 5. Remainder: 8. Result: [1,3,5,8]

Time complexity: O(n log n) always — even for the worst case. Space: O(n) — needs temporary space for merging. Better than bubble sort for large datasets.

4.3.6 Dijkstra's shortest path algorithm

Dijkstra's algorithm — full trace▶

Finds the shortest path from a source vertex to all other vertices in a weighted, directed or undirected graph with non-negative weights.

Algorithm:

Set distance to source = 0, all other distances = ∞
Create a set of unvisited nodes (all nodes)
While unvisited nodes remain:
  a. Pick the unvisited node with smallest current distance (call it current)
  b. For each neighbour of current that is still unvisited:
      calculate tentative_distance = dist[current] + edge_weight
      if tentative_distance < dist[neighbour]: update dist[neighbour] and record current as the predecessor
  c. Mark current as visited (remove from unvisited)

Worked example

Graph:
A --(4)-- B --(2)-- D
|         |
(2)       (5)
|         |
C --(1)-- D

From source A:
Initial: dist = {A:0, B:∞, C:∞, D:∞}

Visit A (dist=0): 
  Neighbour B: 0+4=4 < ∞ → dist[B]=4, pred[B]=A
  Neighbour C: 0+2=2 < ∞ → dist[C]=2, pred[C]=A

Visit C (dist=2, smallest unvisited):
  Neighbour D: 2+1=3 < ∞ → dist[D]=3, pred[D]=C

Visit D (dist=3):
  Neighbour B: 3+2=5 > 4 → no update (B already has a better path)

Visit B (dist=4): done.

Shortest paths from A: A=0, C=2, D=3, B=4
Path to D: A → C → D

The spec says

You won't be required to recall Dijkstra's steps from memory — you'll be given the algorithm in the exam. But you must be able to trace it on a given graph and answer questions about it. Know the applications: GPS navigation, network routing, any shortest-path problem.

4.3 Complexity analysis

Big-O notation — a full explanation▶

Big-O notation describes how an algorithm's resource requirements (time or space) grow as the input size n increases. It gives the worst-case growth rate, ignoring constant factors and lower-order terms.

Why ignore constants? We care about how fast things grow. An algorithm doing 3n steps and one doing 1000n steps are both O(n) — as n gets huge, both grow linearly. What matters is the shape of the growth, not the coefficient.

Big-O	Name	Meaning	Example (n=1000)	Algorithm
O(1)	Constant	Same time regardless of n	1 operation	Array index access, hash table lookup
O(log n)	Logarithmic	Halves search space each step	~10 operations	Binary search, BST search
O(n)	Linear	Steps proportional to n	1,000 operations	Linear search, reading a list
O(n log n)	Linearithmic	Slightly worse than linear	~10,000 operations	Merge sort, quick sort (avg)
O(n²)	Quadratic	n times for n items	1,000,000 operations	Bubble sort, selection sort, nested loops
O(2ⁿ)	Exponential	Doubles with each extra item	2^1000 ≈ impossible	Brute-force subset problems
O(n!)	Factorial	All permutations	1000! ≈ unimaginably huge	Travelling salesman (brute force)

Tractable vs intractable

Tractable problems have a polynomial (O(nᵏ)) or better solution — they can be solved feasibly for large inputs. Intractable problems have no known polynomial solution — they become impossible for large n. Heuristic methods (approximate solutions) are often used for intractable problems. The spec mentions this as a limit of computation.

4.4 — Paper 1

Theory of Computation

Abstraction · FSMs · Sets · Regular expressions · BNF · Turing machines · Halting problem

4.4.1 Abstraction and automation

Computer science is fundamentally about building clean abstract models of messy real-world problems, then putting those models into action through algorithms. Abstraction is the key tool that makes complexity manageable.

All types of abstraction — explained clearly▶

Type	What it means	Example
Representational abstraction	Remove irrelevant details from a model. Keep only what's needed to solve the problem.	A map of the London Underground ignores actual geography — it only shows connections. The irrelevant detail (exact distance, curves) is removed.
Generalisation / categorisation	Group things with common characteristics into a hierarchy: "is-a-kind-of" relationships.	Labrador is-a-kind-of Dog. Dog is-a-kind-of Animal. You can generalise behaviour to the parent class.
Information hiding	Hide implementation details. Expose only what other code needs to know.	You call `stack.push(x)` without knowing whether it's backed by an array or a linked list. The internal detail is hidden.
Procedural abstraction	A procedure captures a computational method. By naming it, you hide the details and abstract the "what" from the "how".	`sortList(data)` — you don't need to know it uses merge sort internally.
Functional abstraction	A further abstraction beyond procedural — only the input/output relationship matters, not the method.	A mathematical function f(x) = x² — you only care that it squares the input.
Data abstraction	Hide how data is stored. Interact via a defined interface. New types can be built from old ones.	A stack ADT — push/pop/peek — without knowing it's implemented as an array.
Problem abstraction / reduction	Strip away detail until the problem becomes equivalent to a known, solvable problem.	A route-finding problem can be reduced to a shortest-path problem on a graph — which Dijkstra solves.
Decomposition	Break a problem into smaller sub-problems, each independently solvable.	Build a game by decomposing into: user input, game logic, rendering, sound — each solved separately.
Composition	Combine solved sub-problems or data objects to build more complex things.	Combine separate `validateInput()` and `processData()` procedures into a `handleUserAction()` procedure.
Automation	Put models into action: create algorithms → implement in code + data structures → execute.	The full pipeline from problem to running program.

4.4.2 Regular languages

4.4.2.1 Finite State Machines (FSMs)

FSMs — what they are and how to draw/trace them▶

A Finite State Machine is a model of computation with:

A finite set of states (drawn as circles)
One designated start state (shown with an arrow pointing into it)
One or more accept/final states (drawn as double circles)
Transitions between states triggered by input symbols (drawn as labelled arrows)

An FSM accepts a string if, starting in the start state and reading the string left to right, you end in an accept state. Otherwise it rejects it.

FSM that accepts strings ending in "01" (over alphabet {0,1})

States: S0 (start), S1, S2 (accept)
Transitions:
  S0 --0--> S1
  S0 --1--> S0
  S1 --0--> S1
  S1 --1--> S2
  S2 --0--> S1
  S2 --1--> S0

Test "1001": S0→S0→S1→S0→... wait:
  '1': S0→S0
  '0': S0→S1
  '0': S1→S1
  '1': S1→S2  ← ends in accept state → ACCEPTED

Test "10": S0→S0→S1 → not accept → REJECTED

State transition table: a grid where rows = current states, columns = inputs, cells = next state. Same information as the diagram, in tabular form.

Mealy machines (A-level addition): FSMs that produce output on each transition, not just accept/reject. The output is written on the transition arrow alongside the input: input/output.

4.4.2.2 Sets — mathematical foundation

Sets, notation, and operations▶

A set is an unordered collection of distinct values (no duplicates). Notation: A = {1, 2, 3, 4, 5}

Set comprehension: A = {x | x ∈ ℕ ∧ x ≥ 1} — read as "the set of all x such that x is a natural number AND x ≥ 1".

The symbol | means "such that". ∈ means "is a member of". ∧ means AND.

Compact set notation: {0ⁿ1ⁿ | n ≥ 1} = the set of all strings with n zeros followed by n ones: {01, 0011, 000111, ...}

Concept	Symbol	Meaning
Empty set	{} or Ø	A set with no elements
Cardinality	\|A\|	Number of elements in set A
Membership	x ∈ A	x is in set A
Subset	A ⊆ B	Every element of A is also in B (includes A=B)
Proper subset	A ⊂ B	A ⊆ B but A ≠ B (B has at least one extra element)
Union	A ∪ B	Elements in A or B or both
Intersection	A ∩ B	Elements in both A and B
Difference	A \ B	Elements in A but NOT in B
Cartesian product	A × B	All ordered pairs (a,b) where a∈A and b∈B

Finite vs infinite sets: The set {1,2,3} is finite. ℕ (natural numbers) is infinite. Countably infinite: can be put in one-to-one correspondence with ℕ (e.g. all integers). ℝ is not countable — there are more real numbers than natural numbers.

4.4.2.3 Regular expressions

Regular expressions — complete guide▶

A regular expression (regex) is a compact notation for describing a set of strings. It defines a regular language — the set of all strings it matches.

Metacharacter	Meaning	Example	Matches
a	Literal character a	cat	Only "cat"
a\|b	a OR b (alternation)	cat\|dog	"cat" or "dog"
a*	Zero or more a's	ab*	"a", "ab", "abb", "abbb", ...
a+	One or more a's	ab+	"ab", "abb", "abbb", ... (not "a")
a?	Zero or one a (optional)	colou?r	"colour" or "color"
(ab)	Group (treat as unit)	(ab)*	"", "ab", "abab", "ababab", ...

Worked examples

a(a|b)* — starts with 'a', followed by zero or more of (a or b). Matches: a, aa, ab, aab, aba, abba, ...

[0-9]+ — one or more digits. Matches any positive integer.

(0|1)*1 — any binary string ending in 1.

Regular expressions = FSMs

This is a key theoretical result: regular expressions and FSMs are exactly equivalent in power. Every regular expression can be converted to an FSM that accepts the same language, and every FSM can be converted to a regular expression. Both describe exactly the class of regular languages.

4.4.3 Context-free languages — Backus-Naur Form (BNF)

BNF and syntax diagrams▶

Some languages cannot be described by regular expressions. For example, the language of balanced brackets {(), (()), ((())), ...} requires keeping count — which an FSM (finite memory) cannot do. These require a more powerful description: context-free grammars, written in BNF.

BNF production rules:

<non-terminal> — a symbol that can be further expanded (in angle brackets)
terminal — a literal character that cannot be further expanded
::= means "is defined as"
| means "or"

<digit>   ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<integer> ::= <digit> | <integer><digit>
<letter>  ::= a | b | c | ... | z
<identifier> ::= <letter> | <identifier><letter> | <identifier><digit>

The second rule says: an integer is either a single digit, OR an integer followed by a digit. This recursive definition captures any length of integer.

Syntax diagrams (also called railroad diagrams) express the same rules graphically. Follow the tracks through boxes (non-terminals) and text (terminals) to build valid strings.

Why BNF over regex?

BNF can express context-free languages like nested structures (brackets, if-else blocks, function calls within function calls) that regular expressions cannot. Programming language syntax is described in BNF in language specifications.

4.4.4 Complexity and limits of computation

See Topic 4.3 for Big-O complexity. Additional concept: some problems are uncomputable — there is no algorithm that can solve them for all inputs, regardless of computing power.

4.4.5 Turing machines

The Turing machine — structure and significance▶

A Turing machine is a theoretical model of computation proposed by Alan Turing in 1936. It is not a physical device — it's a mathematical tool for understanding what computers can and cannot do in principle.

Components:

An infinitely long tape divided into cells. Each cell contains a symbol (or is blank).
A read/write head that scans one cell at a time and can move left or right.
A state register storing the machine's current state.
A transition function (the "program"): given (current state, symbol under head) → (new symbol to write, direction to move, new state).

Simple Turing machine: add 1 to a binary number

Tape: 1 0 1 _ _ _   (binary 5, head starts at rightmost 1)
State: q0 = scanning for rightmost digit
State: q1 = adding 1

Transition:
(q0, 1) → (0, LEFT, q0)   ← flip 1 to 0, carry left
(q0, 0) → (1, HALT)       ← flip 0 to 1, done
(q0, _) → (1, HALT)       ← tape end: write 1, done

Why Turing machines matter

The Church-Turing thesis (widely accepted but unprovable) states: any function that can be computed by any physical computer can also be computed by a Turing machine. This means: if something is computable at all, a Turing machine can compute it. Modern computers are more powerful Turing machines in practice, but no more powerful in theory.

4.4.5 The halting problem

The halting problem — proof and significance▶

The halting problem asks: is there an algorithm that, given any program P and any input I, can determine whether P will eventually halt (stop) or run forever?

Turing proved in 1936 that no such algorithm can exist. This is not a practical limitation — it's a mathematical proof.

Proof by contradiction (simplified):

Assume HALTS(P, I) exists — a program that returns True if P halts on I, False if it runs forever.
Construct program D:
- If HALTS(D, D) = True: loop forever
- If HALTS(D, D) = False: halt immediately
Now ask: what does HALTS(D, D) return?
If True (D halts on D): D's code says loop forever — contradiction.
If False (D doesn't halt on D): D's code says halt — contradiction.
Contradiction either way → HALTS cannot exist.

What this means

The halting problem proves that some problems are uncomputable — no algorithm can solve them for all possible inputs. This sets a fundamental limit on computation: computers are not all-powerful. Antivirus software cannot perfectly detect all malware (equivalent to the halting problem). No perfect type checker can exist for all programs. These are theoretical limits, not engineering failures.

4.5 — Paper 2

Fundamentals of Data Representation

Number systems · Binary · Hex · Two's complement · Characters · Images · Sound · Compression · Encryption

4.5.1 Number systems

Mathematics uses different sets of numbers for different purposes. Computer science needs to know which operations are valid on which types.

Set	Symbol	Members	Used for
Natural numbers	ℕ	{0, 1, 2, 3, …}	Counting — always non-negative whole numbers
Integer numbers	ℤ	{…, -2, -1, 0, 1, 2, …}	All whole numbers including negatives
Rational numbers	ℚ	Any number expressible as p/q (p,q ∈ ℤ, q≠0)	Fractions. All integers are rational (e.g. 7 = 7/1).
Irrational numbers	—	Numbers that cannot be expressed as a fraction	√2, π, e — they have infinite non-repeating decimals
Real numbers	ℝ	All of the above	All measurable quantities. Used for measurement.
Ordinal numbers	—	1st, 2nd, 3rd, …	Position in an ordered sequence

4.5.2 Number bases

Decimal, binary, and hexadecimal — conversions▶

A number base defines how many distinct digit symbols exist and what position values are used.

Base 10 (decimal): digits 0–9. Position values: 1, 10, 100, 1000, …
Base 2 (binary): digits 0–1. Position values: 1, 2, 4, 8, 16, 32, 64, 128, …
Base 16 (hexadecimal): digits 0–9 then A=10, B=11, C=12, D=13, E=14, F=15. Position values: 1, 16, 256, 4096, …

Decimal → Binary: repeatedly divide by 2, collect remainders from bottom up.

42 ÷ 2 = 21 r 0
21 ÷ 2 = 10 r 1
10 ÷ 2 = 5  r 0
 5 ÷ 2 = 2  r 1
 2 ÷ 2 = 1  r 0
 1 ÷ 2 = 0  r 1
42₁₀ = 101010₂  (read remainders bottom to top)

Binary → Decimal: multiply each bit by its place value and sum.

101010₂ = 1×32 + 0×16 + 1×8 + 0×4 + 1×2 + 0×1
        = 32 + 8 + 2 = 42

Binary → Hex: group into nibbles (4 bits) from the right. Each nibble → one hex digit.

10101110₂ = 1010 1110 = A E = AE₁₆

Hex → Binary: expand each hex digit to its 4-bit binary equivalent.

3F₁₆ = 0011 1111₂

Why hexadecimal?

Writing large binary numbers is error-prone: 10110110110101 is hard to read. Hex is compact: each hex digit represents exactly 4 bits. Memory addresses, colour codes (#FF5733), MAC addresses (A4:C3:F0:85:AC:2D) are all written in hex for readability.

4.5.3 Units of information

Binary prefix	Symbol	Power	Bytes
Kibibyte	KiB	2¹⁰	1,024
Mebibyte	MiB	2²⁰	1,048,576
Gibibyte	GiB	2³⁰	1,073,741,824
Tebibyte	TiB	2⁴⁰	~1.1 trillion

Manufacturers often use decimal prefixes (kB = 1000 B, MB = 1,000,000 B) because they make storage look bigger. Operating systems use binary prefixes. This discrepancy is why a "500 GB" hard drive shows as ~465 GiB in Windows.

n bits = 2ⁿ possible values

1 bit → 2 values (0 or 1). 2 bits → 4. 8 bits → 256. This is fundamental. With n bits, you can represent integers 0 to 2ⁿ−1 (unsigned).

4.5.4 Binary arithmetic

Unsigned binary and binary addition▶

Unsigned binary: all n bits represent magnitude. Range: 0 to 2ⁿ−1.

Binary addition rules: 0+0=0, 0+1=1, 1+0=1, 1+1=10 (write 0, carry 1), 1+1+1=11 (write 1, carry 1).

  01101101  (109)
+ 00110101  ( 53)
----------
  10100010  (162)

Overflow occurs when the result exceeds the number of bits available. The extra bit is lost, giving a wrong answer. This is a critical error to be aware of.

Two's complement — signed integers▶

Two's complement is the standard way to represent negative integers in binary. The most significant bit (MSB) has a negative place value: −2ⁿ⁻¹.

For 8-bit two's complement:

Range: −128 to +127
MSB = 0 → positive; MSB = 1 → negative

Converting decimal to two's complement:

If positive: just convert normally
If negative: convert the absolute value to binary, then flip all bits, then add 1

Represent -45 in 8-bit two's complement

45 in binary: 0010 1101

Flip all bits: 1101 0010

Add 1: 1101 0011 ← this is -45 in two's complement

Converting two's complement to decimal: if MSB=0, normal conversion. If MSB=1, the value = -(flip all bits and add 1).

Decode 1101 0011

MSB=1, so negative. Flip: 0010 1100. Add 1: 0010 1101 = 45. So original = -45.

Subtraction using two's complement: A − B = A + (−B). Convert B to its two's complement, then add to A.

12 − 7 using 8-bit two's complement

12 = 0000 1100
-7: 7 = 0000 0111, flip = 1111 1000, add 1 = 1111 1001

  0000 1100
+ 1111 1001
----------
  0000 0101 = 5 ✓  (the overflow bit is discarded)

Fixed-point binary fractions▶

A binary point is inserted at a fixed position. Bits to the right have fractional values: ½, ¼, ⅛, ¹⁄₁₆, …

Format: 4 bits integer, 4 bits fraction
Binary: 0101.1100

Integer part:  0×8 + 1×4 + 0×2 + 1×1 = 5
Fraction part: 1×½ + 1×¼ + 0×⅛ + 0×¹⁄₁₆ = 0.75
Total: 5.75

Decimal to fixed-point binary: convert integer part normally. For the fraction: repeatedly multiply by 2, take the integer part as the next bit.

Convert 0.625 to binary

0.625 × 2 = 1.25 → bit = 1, remainder 0.25

0.25 × 2 = 0.5 → bit = 0, remainder 0.5

0.5 × 2 = 1.0 → bit = 1, remainder 0

0.625₁₀ = 0.101₂

Not all decimals can be represented exactly

0.1 in decimal has no exact binary representation — it repeats forever in binary (like 1/3 in decimal). This causes the famous floating-point rounding errors in programming.

4.5.5 Character coding

ASCII, Unicode, and error checking▶

ASCII (American Standard Code for Information Interchange): uses 7 bits to encode 128 characters — uppercase and lowercase letters, digits 0-9, punctuation, and 32 control characters (newline, tab, etc.). Examples: 'A'=65, 'a'=97, '0'=48.

Extended ASCII extends to 8 bits (256 characters) but has many incompatible variants for different scripts.

Unicode was introduced to handle all world writing systems — over 1 million possible code points, covering Arabic, Chinese, emoji, and everything else. UTF-8 is the most common encoding: backwards-compatible with ASCII (ASCII characters use 1 byte, others use 2-4 bytes). UTF-16 uses 2 or 4 bytes. UTF-32 uses exactly 4 bytes.

Character code ≠ pure binary number

The character '5' (the text symbol) has ASCII code 53. The pure binary representation of the number 5 is 00000101. These are completely different things. Exams test this distinction — an examiner might show you '5' as ASCII and ask what its integer value is.

Parity bit: an extra bit added to make the total number of 1s even (even parity) or odd (odd parity). Detects single-bit errors in transmission.

Data: 1001011 (4 ones)
Even parity bit: 0 (total ones = 4 → already even)
Transmitted: 10010110

If received as: 10010100 (one bit flipped)
Count ones: 3 (odd) → ERROR DETECTED

Majority voting: each bit sent 3 times. The receiver takes whichever value appears in at least 2 of the 3 copies. Can correct single-bit errors.

Sent: 111 000 111 000 → 1 0 1 0
Received: 101 000 110 001
Majority: 1   0   1   0   ← correct!

Check digit: a digit calculated from the other digits and appended to data (ISBN, barcodes, bank account numbers). The receiver recalculates and compares. Detects transcription errors.

4.5.6 Representing images, sound, and other data

Analogue vs digital — and conversion▶

Analogue data is continuous — it can take any value within a range. Sound waves, temperature, voltage are analogue. The real world is analogue.

Digital data is discrete — it can only take specific values from a defined set. Computers process digital data: sequences of 0s and 1s.

ADC (Analogue-to-Digital Converter): samples an analogue signal at regular time intervals (the sample rate), measures the amplitude at each sample point, and rounds to the nearest digital value (quantisation). Used when recording audio, measuring sensors.

DAC (Digital-to-Analogue Converter): takes a sequence of digital values and reconstructs an analogue signal. Used when playing back audio through speakers, controlling motors.

Bitmapped images — resolution, colour depth, file size▶

A bitmap stores an image as a grid of pixels (picture elements). Each pixel's colour is stored as a binary number.

Term	Definition	Effect
Resolution	Number of pixels per inch (PPI/DPI)	Higher = sharper image, larger file
Image dimensions	Width × Height in pixels (e.g. 1920×1080)	Total pixels = width × height
Colour depth	Number of bits stored per pixel	1-bit = B&W (2 colours), 8-bit = 256 colours, 24-bit = 16.7M colours
Metadata	Data about the image (width, height, colour depth, creation date)	Stored in file header; not pixel data

File size calculation (ignoring metadata):

File size (bits) = Width (px) × Height (px) × Colour depth (bits)
File size (bytes) = File size (bits) ÷ 8

Example: 800 × 600 pixels, 24-bit colour
= 800 × 600 × 24 = 11,520,000 bits
= 11,520,000 ÷ 8 = 1,440,000 bytes = 1.44 MB

Digital sound — sampling and MIDI▶

To store sound digitally, an ADC measures the sound wave's amplitude at regular intervals:

Term	Definition	Effect of increasing
Sample rate	Samples taken per second (Hz)	Higher = more accurate reproduction, larger file. CD = 44,100 Hz.
Sample resolution (bit depth)	Bits used per sample	Higher = more precise amplitude values, larger file. CD = 16-bit.

Nyquist theorem: to accurately reproduce a sound containing frequencies up to f Hz, the sample rate must be at least 2f Hz. CDs sample at 44.1 kHz to capture up to 22.05 kHz — slightly above human hearing (~20 kHz).

File size calculation:

File size (bits) = Sample rate × Bit depth × Duration (s) × Channels
Example: 44100 Hz, 16-bit, stereo (2 channels), 3 minutes
= 44100 × 16 × 180 × 2 = 254,016,000 bits ÷ 8 = ~31.75 MB

MIDI (Musical Instrument Digital Interface): does NOT store audio. Stores event messages: "note on", "note off", "pitch", "velocity", "which instrument". A synthesiser interprets these to produce sound. MIDI files are tiny (kilobytes vs megabytes for audio). Easy to edit — change tempo, instrument, transpose key.

4.5.6.7 Data compression

Lossless vs lossy — and techniques▶

Lossless compression

Original data perfectly reconstructed
No information lost
Used for: text, executable code, PNG, ZIP
Must use lossless for programs (corrupted code crashes)

Lossy compression

Some data permanently discarded
Cannot reconstruct exactly
Used for: JPEG photos, MP3, MP4 video
Human perception can't detect small losses

Run-length encoding (RLE) — lossless. Replace a run of repeated values with (count, value).

Original: AAAABBBBAACCCCCC (16 chars)
Encoded:  4A4B2A6C          (8 chars) ← 50% smaller
Decoded back: AAAABBBBAACCCCCC ← perfect reconstruction

Works well for simple images with large areas of uniform colour (e.g. icons, cartoons). Bad for photographs (few repeated pixels).

Dictionary-based (LZW, LZ77) — lossless. Build a dictionary of repeated sequences encountered so far. Replace subsequent occurrences with short codes referencing the dictionary. Used in ZIP, GIF, PNG.

4.5.6.8 Encryption

Caesar cipher, Vernam cipher, computational security▶

Encryption: transforming plaintext (readable) into ciphertext (unreadable) using a cipher and a key. The ciphertext can only be understood by someone who knows how to decrypt it.

Caesar cipher: shift each letter in the plaintext by a fixed number of positions in the alphabet.

Key = 3
A→D, B→E, C→F, ..., X→A, Y→B, Z→C

Plaintext:  HELLO
Ciphertext: KHOOR

Why easily cracked? Only 25 possible keys. An attacker can try all 25 in seconds. Also, letter frequency analysis works: if 'E' is the most common letter in English and 'H' appears most in the ciphertext, the shift is probably 3.

Vernam cipher (one-time pad): XOR each character of the plaintext with the corresponding character of a truly random key of equal length. The key must be used exactly once.

Plaintext:  01001000  ('H' in binary)
Key:        10110101  (random)
XOR (⊕):   11111101  (ciphertext)

To decrypt: 11111101 ⊕ 10110101 = 01001000 → 'H'

Perfect security — mathematical proof

Because the key is truly random, the ciphertext gives zero information about the plaintext. Every possible plaintext is equally likely given the ciphertext. The Vernam cipher is the only cipher ever proven mathematically to be completely secure.
All other ciphers (AES, RSA, etc.) rely on computational security: they're secure because breaking them would take longer than the age of the universe with current computers. But mathematically, given infinite time, they could be broken. Vernam cannot — ever.

Comparison:

	Caesar cipher	Vernam cipher	Modern ciphers (AES etc.)
Security type	None (25 keys)	Perfect (mathematical proof)	Computational (hard, not impossible)
Key length	Single shift value	As long as the message	Fixed (128, 256 bits)
Key reuse	N/A	Never — one-time only	OK (different ciphertext each time)
Practical?	No	Key management is hard	Yes — used everywhere

4.6 — Paper 2

Fundamentals of Computer Systems

Hardware/software · OS · Language types · Translators · Logic gates · Boolean algebra

4.6.1 Hardware and software

Hardware: the physical components of a computer system — anything you can physically touch. CPU, RAM, hard disk, keyboard, monitor, motherboard.

Software: programs — sequences of instructions that run on hardware. You cannot touch software; it exists as data stored on hardware. Without software, hardware is useless.

Software is divided into two categories:

Type	Purpose	Examples
System software	Manages and maintains the computer hardware and provides services for application software	Operating systems, utility programs, libraries, translators
Application software	Designed to help users perform specific tasks	Word processors, games, web browsers, spreadsheets

System software categories

Operating systems — what they actually do▶

An operating system (OS) is the most important piece of system software. It sits between the hardware and user programs. Its two key jobs are:

1. Hiding hardware complexity: Application programs don't need to know whether the computer has a Samsung or Western Digital hard drive, or whether it's running on an Intel or AMD processor. The OS provides a consistent interface (API) that works regardless of the underlying hardware.

2. Resource management: Multiple processes run simultaneously — your browser, music player, antivirus, and system services are all running at once. The OS manages the allocation of:

Processor time — scheduling which process runs when, for how long (process scheduling)
Memory — deciding which parts of RAM each process can use
I/O devices — managing access to keyboard, disk, network card, etc.
File system — organising files on storage, managing permissions

Examples: Windows, macOS, Linux, Android, iOS.

Utility programs, libraries, and translators▶

Utility programs: perform specific system maintenance tasks. Examples: disk defragmenter, backup software, antivirus, file compression tools, disk formatter.

Libraries: collections of pre-written, pre-tested code that programs can call. Instead of writing a square root function from scratch, your program calls math.sqrt() from the maths library. Libraries are shared — many programs use the same library, saving memory and development time.

Translators: programs that convert source code in one language to another form. Three types: compiler, interpreter, assembler — covered in section 4.6.3.

4.6.2 Programming language classification

Low-level vs high-level languages▶

Level	Type	Characteristics	Advantages	Disadvantages
Low-level	Machine code	Pure binary (0s and 1s). Directly executed by CPU. Processor-specific — code for an Intel CPU won't run on an ARM CPU.	Fastest possible execution. No translation overhead.	Almost impossible for humans to read or write. Not portable. A single mistake in one bit crashes everything.
Low-level	Assembly language	Mnemonic (human-readable) form of machine code. e.g. ADD, MOV, LDR. One-to-one with machine code instructions. Still processor-specific.	More readable than machine code. Fast execution after assembly. Direct hardware control.	Still hard to write for complex programs. Not portable. Different CPUs have different instruction sets.
High-level	Imperative (Python, Java, C#)	Human-readable. Uses abstractions far removed from hardware. One line often compiles to many machine code instructions. Portable — same code runs on different hardware.	Easy to read and write. Portable. Large standard libraries. Faster development.	Requires translation before execution. Slightly less efficient than hand-written assembly.

An imperative high-level language is one where you write explicit step-by-step instructions telling the computer how to do something. This distinguishes it from declarative languages (like SQL or Haskell) where you describe what you want without specifying how.

4.6.3 Translators

Compiler, interpreter, and assembler — compared in detail▶

Source code (what the programmer writes) must be translated into a form the computer can execute. Three translation strategies:

	Compiler	Interpreter	Assembler
What it does	Translates the entire source program to machine code before any execution	Translates and executes one line/statement at a time	Translates assembly language to machine code (one-to-one)
Output	Executable file (object code) — can be run without the compiler	No separate output — runs directly	Machine code binary
Speed	Fast execution (already translated)	Slower execution (translates each time it runs)	Fast execution
Error reporting	All errors found at compile time, before running	Stops at the first error encountered at runtime	All errors at assembly time
Debugging	Harder — error messages can be cryptic	Easier — you can test line by line	Difficult
Examples	C, C++, Swift (compiled)	Python (often interpreted)	MASM, NASM (x86 assemblers)

Bytecode — the middle ground

Some compilers (Java, C#) produce bytecode — an intermediate form that is not native machine code but not source code either. Bytecode runs on a virtual machine (JVM for Java, CLR for C#). This achieves portability: the same bytecode runs on Windows, macOS, and Linux — the VM handles the hardware-specific part. Python also compiles to bytecode (.pyc files) which its interpreter then runs.

Source code: the human-readable program the programmer writes. Object (executable) code: the compiled machine code that the CPU runs directly.

4.6.4 Logic gates

All six gates with truth tables▶

Logic gates are electronic circuits that perform Boolean operations on binary inputs (0=false, 1=true). They are the fundamental building blocks of all digital computers.

A	B	NOT A	A AND B	A OR B	A XOR B	A NAND B	A NOR B
0	0	1	0	0	0	1	1
0	1	1	0	1	1	1	0
1	0	0	0	1	1	1	0
1	1	0	1	1	0	0	0

Memory tricks:

AND: both inputs must be 1 → output is 1
OR: at least one input is 1 → output is 1
XOR (exclusive OR): exactly one input is 1 → output is 1 (different → 1)
NAND: NOT AND — everything AND does, NAND does the opposite
NOR: NOT OR — everything OR does, NOR does the opposite

NAND and NOR are universal gates

Any logic circuit can be built using only NAND gates (or only NOR gates). You can create NOT, AND, and OR from NAND alone. This is why they're important in hardware design — manufacturing only one type of gate reduces cost and complexity.

In circuit diagrams: gates are connected with wires. You must be able to: trace inputs through a circuit to produce a truth table, write a Boolean expression for a circuit, and draw a circuit from a Boolean expression.

4.6.5 Boolean algebra

Boolean identities and De Morgan's laws▶

Boolean algebra lets you simplify logic expressions algebraically — the same way you simplify 2x + 3x = 5x in regular algebra. This reduces the number of gates needed in a circuit, saving cost and power.

Key identities:

Identity	Name
A AND 0 = 0	Null law (AND)
A AND 1 = A	Identity law (AND)
A OR 0 = A	Identity law (OR)
A OR 1 = 1	Null law (OR)
A AND A = A	Idempotent law (AND)
A OR A = A	Idempotent law (OR)
A AND NOT A = 0	Complement law (AND)
A OR NOT A = 1	Complement law (OR)
NOT(NOT A) = A	Double negation
A AND B = B AND A	Commutative
A AND (B AND C) = (A AND B) AND C	Associative
A AND (B OR C) = (A AND B) OR (A AND C)	Distributive

De Morgan's laws — essential for converting between NAND/NOR and AND/OR/NOT:

NOT(A AND B) = (NOT A) OR (NOT B)
NOT(A OR B) = (NOT A) AND (NOT B)

Simplification example

Simplify: (A AND B) OR (A AND NOT B)

= A AND (B OR NOT B) [distributive law]

= A AND 1 [complement law: B OR NOT B = 1]

= A [identity law: A AND 1 = A]

The complex expression simplifies to just A — one wire, no gates needed!

4.7 — Paper 2

Fundamentals of Computer Organisation and Architecture

CPU internals · Buses · Fetch-execute cycle · Registers · Assembly language · Storage devices

4.7.1 Internal hardware components

CPU, memory, and buses — the full picture▶

All modern computers have the same fundamental components communicating via buses (collections of parallel wires):

Component	Role
Processor (CPU)	Fetches instructions from memory, decodes them, and executes them. The "brain". Contains the ALU, control unit, registers, and cache.
Main memory (RAM)	Stores the currently running program and its data. Volatile — contents lost when power off. Fast access. Much smaller than secondary storage.
Address bus	Carries memory addresses from the CPU to RAM (or I/O devices). Unidirectional (CPU → memory). Width determines how many unique addresses exist: 32-bit → 2³² = 4 GB addressable memory.
Data bus	Carries data between CPU, RAM, and I/O. Bidirectional. Width = word length — 64-bit bus moves 8 bytes per transfer.
Control bus	Carries control signals: read/write signal (tells memory whether to send or receive data), clock signal, interrupt requests, bus request/grant. Bidirectional.
I/O controllers	Interface chips between the CPU and I/O devices. Translate between the CPU's protocols and each device's requirements. Manage timing and buffering.

Von Neumann vs Harvard architecture

	Von Neumann	Harvard
Memory model	Single shared memory for both instructions and data	Separate memories: one for instructions, one for data
Bus system	Single bus for both — instructions and data compete for access	Separate buses — can fetch instruction and access data simultaneously
Speed	Von Neumann bottleneck: the shared bus limits throughput	Faster — no bottleneck between instruction fetch and data access
Used in	General-purpose computers (your laptop, desktop)	Embedded systems, digital signal processors (DSPs), microcontrollers

4.7.2 The stored program concept

The stored program concept (Von Neumann's key insight): machine code instructions are stored in main memory as data, and the CPU fetches and executes them serially (one at a time, in sequence, unless a branch occurs). The same memory holds both the program and its data — both are just binary numbers in RAM.

This is profound: before this idea, computers were physically rewired to change their program. The stored program concept made software possible — you change what a computer does by loading different data into RAM, not by rewiring it.

4.7.3 Processor components and registers

Every register — what it holds and why▶

Component	Role
ALU (Arithmetic Logic Unit)	Performs all arithmetic (add, subtract, multiply, divide) and logical (AND, OR, NOT, XOR, shifts) operations. The computing engine.
Control Unit (CU)	Decodes instructions and coordinates the entire fetch-execute cycle. Sends control signals to other components telling them what to do and when.
Clock	Generates regular electrical pulses (the clock signal) that synchronise all CPU operations. Each pulse = one clock cycle. Clock speed (GHz) = cycles per second.
General-purpose registers	Very fast temporary storage inside the CPU. Used to hold operands and results during calculations. Far faster than RAM.
Program Counter (PC)	Holds the memory address of the next instruction to be fetched. Automatically incremented after each fetch. Modified by branch instructions (jumps).
Current Instruction Register (CIR)	Holds the instruction currently being decoded and executed. The control unit reads the opcode from here.
Memory Address Register (MAR)	Holds the memory address the CPU wants to access (read from or write to). Connected directly to the address bus.
Memory Buffer Register (MBR)	Acts as a buffer between the CPU and main memory. Holds data that has just been read from memory (waiting to go into the CPU) or data about to be written to memory.
Status Register	Contains flag bits set by the ALU after operations: zero flag (result = 0), carry flag (arithmetic overflow/carry), negative flag (result < 0), overflow flag. Used by conditional branches.

The Fetch-Execute cycle — step by step

Full detailed trace of the fetch-execute cycle▶

This cycle repeats billions of times per second in a modern CPU. Each instruction goes through three stages:

FETCH

MAR ← PC: The address held in the Program Counter is copied into the MAR. We're going to fetch from this address.
PC ← PC + 1: The PC is incremented immediately so it points to the next instruction. (This happens before the current instruction is even decoded.)
MDR/MBR ← Memory[MAR]: The data at the address in MAR is fetched from RAM and placed in the MBR. This travels along the data bus.
CIR ← MBR: The instruction is copied from MBR into the CIR, where it stays while being decoded and executed.

DECODE

The Control Unit reads the instruction in the CIR. It splits it into the opcode (what operation to do) and the operand (the value or address to operate on). It then activates the appropriate components.

EXECUTE

The decoded instruction is carried out — this might involve the ALU (for arithmetic), main memory (for load/store), or I/O. The result may update a register, memory, or the PC (for a branch).

Exam trap — when does PC increment?

The PC is incremented during the fetch stage, before the instruction is decoded or executed. After fetching instruction at address 100, PC = 101 — even while instruction 100 is still being decoded. This matters for understanding how branch instructions work: a branch overwrites the PC with a new address, overriding the auto-increment.

4.7.3.3 Instruction format

A machine code instruction consists of:

Opcode: the operation code — which instruction to perform (ADD, LOAD, STORE, JUMP, etc.)
Operand: the data to operate on — this could be an actual value, a memory address, or a register number

The number of bits for the opcode limits the instruction set size: 4-bit opcode → 2⁴ = 16 possible instructions. The number of bits for the operand limits the addressable memory range.

4.7.3.4 Addressing modes

ImmediateThe operand is the actual data value. The instruction carries the value itself. e.g. LDA #5 — load the number 5 into the accumulator (not whatever is stored at address 5).

DirectThe operand is the memory address where the data is stored. The CPU uses the operand as an address and fetches the data from that location. e.g. LDA 200 — load the value stored at memory address 200.

4.7.3.5 Assembly language operations

Mnemonic	Operation	Example	Effect
LDA	Load	LDA #5	Load 5 into accumulator (immediate)
LDA	Load (direct)	LDA 200	Load value at address 200 into accumulator
STO	Store	STO 300	Store accumulator value at address 300
ADD	Add	ADD #3	Add 3 to accumulator
SUB	Subtract	SUB 100	Subtract value at address 100 from accumulator
JMP	Unconditional branch	JMP 50	PC ← 50. Always jump to instruction at address 50.
BRZ	Branch if zero	BRZ 50	If zero flag set: PC ← 50. Otherwise continue.
CMP	Compare	CMP #0	Compare accumulator with 0. Set flags. Don't change accumulator.
LSL	Logical shift left	LSL #1	Shift bits left by 1. Equivalent to ×2.
LSR	Logical shift right	LSR #1	Shift bits right by 1. Equivalent to ÷2.
AND / OR / NOT / XOR	Bitwise logical	AND #15	Perform bitwise operation on accumulator. Used for masking bits.
HLT	Halt	HLT	Stop execution.

4.7.3.6 Factors affecting processor performance

All factors — explained with why each matters▶

Factor	Effect	Why
Clock speed (GHz)	Higher = more cycles per second	Each instruction takes a fixed number of clock cycles. Faster clock = more instructions per second. Limit: heat generated.
Multiple cores	True parallelism	Each core is an independent processor. Quad-core can run 4 threads simultaneously. Benefit only if software is multi-threaded.
Cache memory	Reduces memory access time	RAM is ~100× slower than the CPU. Cache (very fast SRAM on the CPU chip) stores recently/frequently used data. L1 (smallest, fastest) → L2 → L3 (largest, slowest of the three).
Word length	More data per operation	A 64-bit CPU processes 64 bits in one operation. A 32-bit CPU needs 2 operations for the same 64-bit value. Wider word = more data per cycle.
Address bus width	More addressable memory	32-bit address bus → 2³² = 4 GiB addressable RAM. 64-bit → 2⁶⁴ = theoretically 18 exabytes. Limits how much RAM the CPU can use.
Data bus width	More data per memory access	64-bit data bus transfers 8 bytes per cycle vs 4 bytes for 32-bit. Reduces the number of memory accesses needed for large data.

4.7.4 External hardware devices

Input/output devices — principles of operation▶

Barcode reader: a light source (LED or laser) illuminates the barcode. Dark bars absorb light; white spaces reflect it. A photodetector converts reflections to a digital signal (0s and 1s representing the bars). This is decoded to the product code.

Digital camera: a lens focuses light onto a CCD (charge-coupled device) or CMOS sensor. The sensor contains millions of light-sensitive cells (pixels). Each cell measures light intensity and converts it to an electrical charge. This is sampled by an ADC and stored as binary data.

Laser printer: 1. A laser beam sweeps across a photosensitive drum, discharging areas where ink should not appear. 2. Charged toner particles stick to the remaining charged areas. 3. Paper rolls past the drum and toner transfers to paper. 4. The fuser unit melts the toner onto the paper permanently.

RFID (Radio Frequency Identification): a reader emits a radio signal that powers a passive tag (no battery needed). The tag responds by transmitting its stored ID. Used in contactless payment, access cards, stock tracking. Active tags have their own power source and longer range.

Secondary storage — HDD, optical, SSD compared▶

Secondary storage is needed because RAM is volatile (data lost when power off) and expensive. Secondary storage is non-volatile, persistent, and cheaper per gigabyte.

	Hard disk (HDD)	Optical disk (CD/DVD/Blu-ray)	Solid-state drive (SSD)
Technology	Magnetic platters spinning at 5400–7200 RPM. Read/write head moves across surface.	Laser reads/writes pits and lands on a reflective disc surface. Pits = 0, lands = 1 (approximately).	NAND flash memory. Floating gate transistors trap/release charge. No moving parts.
Speed	Moderate. Limited by physical movement of head. Seek time ~5-10ms.	Slow. Disc must spin to correct position. Sequential access.	Fast. No mechanical movement. Seek time <0.1ms.
Capacity	High (1TB – 20TB)	Low (700MB CD, 25GB Blu-ray)	High (256GB – 4TB)
Durability	Fragile — head crash if dropped. Moving parts wear out.	Fragile — scratches ruin data. Degrades over decades.	Robust — no moving parts. Survives shocks well.
Cost per GB	Cheapest	Cheap per disc	More expensive than HDD
Best for	Large, cheap storage (NAS, archives)	Distribution, backups, archiving	Operating system, applications, fast working storage

SSD detail the spec specifically mentions

NAND flash uses floating gate transistors. Each cell can trap electrons (charge = 1, no charge = 0). Data is written in pages (typically 4KB). Pages are grouped into blocks (typically 256KB). You can write to individual pages, but erasing can only happen at the block level — the entire block must be erased before its pages can be rewritten. This is why SSDs use wear levelling (spreading writes across the drive) and write-combining to extend lifespan.

4.8 — Paper 2

Consequences of Uses of Computing

Moral · Social · Ethical · Legal · Cultural issues and legislation

How this topic is assessed

Unlike other topics, this doesn't require specific memorised facts. Examiners want you to reason, argue, and analyse. Questions ask you to "discuss", "evaluate", or "argue for/against". Structure answers with a clear position, specific examples, multiple perspectives, and a conclusion. Essay-style marks reward depth and balance.

The core argument

Computing has given individual programmers and technology companies enormous power — the ability to monitor billions of people, process vast amounts of personal data, and influence what information people see. With that power comes responsibility. The spec asks you to think carefully about the moral, social, legal, and cultural dimensions of that power.

Individual (moral) issues

Developer responsibility and algorithm ethics▶

When a software engineer writes a recommendation algorithm for social media, they make decisions about what values the system encodes: engagement? truthfulness? user wellbeing? These decisions affect billions of people. The engineer has moral responsibility for those choices.

Key considerations:

Algorithmic bias: if training data reflects historical discrimination (e.g. fewer women in senior roles), an AI hiring tool trained on it will perpetuate that bias. The engineer who didn't correct for this has made a moral choice, intentionally or not.
Intended vs actual use: a tool built for communication can be used for coordination of harm. What responsibility does the builder have?
Scale: a bug affecting 0.1% of 1 billion users affects 1 million people. The same negligence that would be minor in a small business becomes catastrophic at scale.
Intellectual property: software can be copied infinitely at zero cost. This creates tensions between creators' right to profit and others' desire to access.
Privacy: collecting data about users without meaningful consent. The right to be forgotten. Data minimisation.

Social (ethical) issues

How computing transforms society▶

Computing enables:

Mass surveillance: CCTV with facial recognition, mobile phone tracking, GCHQ mass data collection (revealed by Snowden). Who watches the watchers?
Data harvesting: social media companies collect detailed personal profiles used for targeted advertising, political micro-targeting (Cambridge Analytica), and sold to third parties.
Information dissemination: anyone can publish globally. Misinformation spreads as fast as truth. "Fake news" can influence elections.
Automation and employment: jobs disappearing to automation (manufacturing, admin, potentially professional roles). Who benefits from the productivity gains?
Digital divide: unequal access to technology creates unequal opportunity. The elderly, the poor, and those in developing countries are disadvantaged.

Legal issues

UK legislation and challenges for lawmakers▶

Relevant UK legislation (know the purpose, not specific sections):

Act	Purpose	Key provisions
Computer Misuse Act 1990	Criminalise unauthorised computer access	Three offences: unauthorised access; access with intent to commit further offence; unauthorised modification of data. Covers hacking and malware distribution.
Data Protection Act 2018 / GDPR	Protect personal data	Data must be processed lawfully, fairly, transparently. Purpose limitation. Data minimisation. Accuracy. Storage limitation. Right of access, correction, erasure. Must have a lawful basis (e.g. consent).
Copyright, Designs and Patents Act 1988	Protect intellectual property	Software is protected by copyright from the moment of creation. Illegal to copy, distribute, or modify without licence. Open source licensing is a legal framework for sharing.
Regulation of Investigatory Powers Act (RIPA) 2000	Govern surveillance powers	Controls how government agencies can intercept communications and conduct surveillance. Controversial — balance between security and privacy.

Why legislation struggles:

Technology changes faster than law: by the time a law is passed and implemented, the technology has moved on.
Jurisdiction: a company in the USA serving UK users — which country's law applies? Enforcing laws across borders is extremely difficult.
Anonymity: it's hard to identify and prosecute those who break laws online.
Technical complexity: legislators often don't fully understand the technology they're trying to regulate.

Cultural issues

Technology's impact on culture▶

Cultural imperialism: dominant tech companies (mostly American) shape global culture through the platforms and defaults they choose. The English language is favoured. Western values are embedded in algorithms.
Filter bubbles: recommendation algorithms show you content similar to what you've already engaged with. Over time, people see only views they already agree with, deepening polarisation.
Impact on human behaviour: social media designed for engagement exploits psychological vulnerabilities (variable reward schedules, social comparison). Linked to anxiety, depression, reduced attention spans.
Cultural heritage: the internet has preserved vast amounts of human knowledge and culture, but digital formats become obsolete. Who maintains digital archives?

Exam technique for extended answer questions

Structure your answer: Point → Explanation → Example → Counterpoint. Never just list issues. For each one: explain why it's a problem, give a real-world example, and acknowledge the opposing view. Marks are awarded for critical thinking, not just recall.

4.9 — Paper 2

Fundamentals of Communication and Networking

Transmission · Protocols · TCP/IP · DNS · Security · Encryption · CRUD · REST

AQA spec corrections

Only physical star and logical bus topologies are required. Ring, mesh, MAN, SAN are not on the A-level spec. IMAP is not required — only POP3. CSMA/CD (wired) is not required — only CSMA/CA (wireless).

4.9.1 Communication methods

Serial vs parallel transmission — detailed explanation▶

Data must travel from one place to another as electrical signals. Two fundamental approaches:

Parallel transmission: multiple bits sent simultaneously, each on its own wire. A byte (8 bits) sent over 8 wires at once. Sounds faster — but:

Skew: each wire has slightly different electrical properties. Bits that were sent simultaneously arrive at slightly different times. At high speeds or over long distances, this makes it impossible to tell which bits belong together.
Crosstalk: electrical signals in adjacent wires interfere with each other (electromagnetic induction). Causes errors.
Requires more wires — more expensive cable, more connectors.

Serial transmission: bits sent one at a time, in sequence, over a single channel. Slower bit-by-bit but avoids skew and crosstalk completely. Modern serial interfaces (USB 3.2, PCIe 4.0, SATA) achieve far higher speeds than older parallel standards (IDE parallel ATA) despite sending fewer bits simultaneously, because they can clock much faster.

This is a key exam point

The question "why is serial preferred over parallel for long-distance data transmission?" has a specific answer: serial avoids skew (timing differences between bits on parallel wires) and crosstalk (interference between adjacent wires). Always mention both.

Synchronous vs asynchronous — how timing works▶

For receiver to interpret bits correctly, it must know when each bit starts and ends. Two approaches:

Synchronous transmission: sender and receiver share a common clock signal. Data flows as a continuous stream, with each bit occupying exactly one clock period. The receiver samples the line on each clock tick. Efficient for bulk data transfer (HDMI, Ethernet). Requires clock synchronisation between devices.

Asynchronous transmission: no shared clock. Data is sent in short bursts at any time. Each character (or byte) is wrapped in:

Start bit (always 0): signals "a character is coming — get ready". The receiver detects the change from idle (1) to 0 and starts its internal timer.
The data bits (usually 7 or 8)
Stop bit(s) (always 1, sometimes 2): signals "character complete, line returning to idle".

Asynchronous frame for 'A' (65 = 0100 0001)

Idle: 1 1 1 1 | START: 0 | DATA: 1 0 0 0 0 0 1 0 | STOP: 1 | Idle: 1 1

Start/stop bits add overhead (10 bits to transmit 8 bits of data) but allow devices to communicate without synchronising clocks first. Used in older serial ports, UART.

4.9.1.2 Communication definitions

Baud rateThe number of signal changes (symbols) per second. Measured in baud. If each signal change carries exactly one bit, baud rate = bit rate. If each signal change carries multiple bits (e.g. QAM modulation uses multiple amplitude/phase combinations), bit rate > baud rate.

Bit rateThe number of bits transmitted per second (bps). Bit rate = baud rate × bits per symbol. This is what determines data throughput.

BandwidthThe maximum possible bit rate of a channel, measured in bps. A higher-bandwidth channel can carry more data per second. Bit rate is directly proportional to bandwidth.

LatencyThe time delay from when data is sent to when it is received. Independent of bandwidth. A satellite link can have high bandwidth but high latency (~600ms round trip) due to the distance to orbit.

ProtocolAn agreed set of rules governing how data is transmitted and received between devices. Protocols define: message format, error detection, flow control, handshaking procedures. Without agreed protocols, devices from different manufacturers cannot communicate.

4.9.2 Networking

Physical star vs logical bus — explained in full▶

Physical star topology: every device connects to a central switch via its own dedicated cable. The switch is intelligent — it maintains a table of (MAC address → port number) and forwards each frame only to the correct destination port.

Star advantages

Single cable failure → only one device affected
Easy to add new devices
Switch prevents unnecessary traffic reaching other devices (better security, better performance)
Fault isolation — easier to diagnose problems

Star disadvantages

Switch failure → entire network down (single point of failure)
Requires more cable than bus
Switch costs money

Logical bus topology: data from any device is broadcast to all devices. Each device checks whether the data is addressed to it; if not, it discards it. This describes how early Ethernet worked and how a hub works today (a hub is essentially a repeater — it broadcasts all received data to all ports).

Crucially: a network can be physically wired as a star (all cables go to a central device) but behave logically as a bus if that central device is a hub rather than a switch. The physical layout describes the cables; the logical layout describes how data flows.

Hub vs switch — critical difference

A hub broadcasts every received frame to every port — it's dumb. A switch learns which MAC address is on which port and forwards frames only to the correct port — it's intelligent. Never say a switch broadcasts.

Client-server vs peer-to-peer▶

	Client-server	Peer-to-peer (P2P)
Structure	One or more powerful dedicated servers. Many client devices that request services.	All devices ("peers") have equal status — each can act as both client and server.
Control	Centralised — server controls access, data, and security policies.	Decentralised — no single point of control.
Security	Easier to manage: access controls, central authentication, centralised updates.	Harder — no central authority. Each peer must manage its own security.
Cost	High upfront — dedicated server hardware, server software licences.	Low — no dedicated hardware needed.
Scalability	Server can become a bottleneck under heavy load.	Performance often improves with more peers (more sources).
Fault tolerance	Server failure = entire service down.	No single point of failure — if one peer fails, others continue.
Use cases	Email, web, banking, corporate networks.	BitTorrent file sharing, blockchain, some VoIP.

Wireless networking — Wi-Fi, SSID, CSMA/CA in full detail▶

Wi-Fi (IEEE 802.11 family): a wireless LAN standard. Devices need a wireless network adapter (NIC with antenna) and connect to a wireless access point (WAP), which connects to the wired network infrastructure.

The SSID (Service Set Identifier) is the network's human-readable name (e.g. "Home WiFi"). The WAP broadcasts this name so devices can find and connect to it. Disabling SSID broadcast hides the name from automatic scans — devices must know the name to connect. This is a minor security measure; the network is still discoverable by scanning with the right tools.

Why wireless needs Collision Avoidance rather than Detection:

Wired Ethernet uses CSMA/CD (Collision Detection) — if a device detects a collision while transmitting, it stops and retries. Wireless devices cannot detect collisions while transmitting because they can't hear themselves over the power of their own outgoing signal. So wireless uses CSMA/CA (Collision Avoidance) — try to prevent collisions rather than detect them.

CSMA/CA without RTS/CTS:

Before transmitting, listen (carrier sense) — is the channel in use?
If idle: wait a random backoff period (reduces chance two devices start simultaneously)
Transmit
Wait for an acknowledgement (ACK) from the receiver
No ACK within timeout → assume collision occurred → wait a new random backoff → retry

CSMA/CA with RTS/CTS (Request to Send / Clear to Send):

Sender sends a short RTS frame to the access point
Access point broadcasts a CTS frame that all devices in range can hear
All devices that hear the CTS go silent for the specified duration
The sender transmits without interference risk
Receiver sends ACK

Why RTS/CTS? The hidden node problem: Device A and Device B are both connected to AP, but can't hear each other (they're on opposite sides of the AP, out of range of each other). Without RTS/CTS, both might detect the channel as idle and transmit simultaneously, causing a collision at the AP that neither detects. RTS/CTS solves this because the CTS is broadcast by the AP — both A and B hear it and know to stay quiet.

Wireless security measures:

WPA2 / WPA3: encrypt all data transmitted over the wireless link. Uses AES encryption. WPA2 replaced the broken WEP standard. WPA3 is the latest standard.
SSID broadcast disabled: hides network name from casual discovery.
MAC address allow list: WAP only accepts connections from pre-approved MAC addresses. Note: MAC addresses can be "spoofed" (faked), so this isn't a strong security measure alone.

4.9.3 The Internet

Packet switching — why the internet works the way it does▶

Before packet switching, the telephone network used circuit switching: when you made a call, a dedicated physical circuit was established between you and the other person for the entire duration. All resources on that path were reserved, even during silences.

Packet switching works differently: data is broken into small chunks called packets. Each packet is routed independently across the network — different packets from the same transmission may take completely different paths. At the destination, they're reassembled in the correct order.

Packet structure:

Section	Contents	Purpose
Header	Source IP, destination IP, packet sequence number, TTL (time to live), protocol identifier	Routing: tells routers where the packet came from and where it's going
Payload	The actual data (chunk of the file, webpage, etc.)	The content being transported
Trailer	Checksum / error detection code	The receiver uses this to check if the packet was corrupted in transit

Packet switching advantages

Efficient — bandwidth shared; unused capacity is available to others
Fault tolerant — if one link fails, packets route around it
Scales to many simultaneous users
Only retransmit lost packets, not the whole file

Packet switching disadvantages

Variable delay (jitter) — packets may arrive out of order
Overhead from headers on every packet
Not ideal for real-time continuous streams (though TCP/UDP handle this)

Routers, gateways, and how routing works▶

A router is a device that forwards packets between networks. Every router maintains a routing table — a list of known network addresses and the best "next hop" to send packets towards their destination. Routers operate at Layer 3 (network layer), using IP addresses.

When a packet arrives at a router:

Router reads the destination IP address from the packet header
Looks up the routing table to find the best path
Forwards the packet to the next router (next hop) on that path
The next router repeats this process until the packet reaches its destination

A gateway is a more general concept: a node that allows data to pass between networks that may use different protocols. Your home router is a gateway — it connects your LAN (using private IP addresses) to the internet (using public IP addresses) and handles the protocol translation (NAT).

DNS — how domain names become IP addresses▶

Computers communicate using IP addresses. Humans prefer domain names like google.com. DNS (Domain Name System) translates between them. It's a globally distributed, hierarchical database — no single server holds all records.

Domain name hierarchy:

www . bbc . co . uk
 ↑    ↑    ↑    ↑
host  domain  SLD  TLD

TLD (Top Level Domain): .uk, .com, .org, .net
Second-level domain (SLD): co (in .co.uk), bbc
Subdomain: www

A Fully Qualified Domain Name (FQDN) specifies the complete host identity: www.bbc.co.uk.
A URL (Uniform Resource Locator) adds the protocol and path: https://www.bbc.co.uk/news/article1.

DNS resolution — step by step:

Browser checks its local DNS cache — if it's visited this site recently, the IP might be stored. If found, skip to step 6.

Query sent to ISP's recursive DNS resolver (also checks its cache).

Resolver queries a root nameserver (there are 13 sets globally). Root nameserver says "I don't know the IP, but the .uk nameservers are at these addresses."

Resolver queries the .uk TLD nameserver, which says "the authoritative nameserver for bbc.co.uk is at this address."

Resolver queries the authoritative nameserver for bbc.co.uk. This server knows the actual IP: "www.bbc.co.uk = 151.101.0.81".

IP returned to browser. Result cached locally. Browser connects to 151.101.0.81.

Internet registries

Organisations (ICANN globally, Nominet for .uk) manage the allocation of IP addresses and domain names. They're essential for ensuring uniqueness — without them, two organisations could claim the same address, causing chaos. They maintain the root zone file that all DNS depends on.

4.9.3.2 Internet security

Firewalls — three types in detail▶

A firewall monitors and filters incoming and outgoing network traffic based on predetermined security rules. It sits at the boundary between a trusted internal network and an untrusted external network (the internet).

Type	How it works	Advantage	Limitation
Packet filtering	Examines each packet's header (source/dest IP, source/dest port, protocol). Accepts or rejects based on rules. Stateless — each packet judged independently.	Fast. Low resource usage. Simple to configure.	Can't detect attacks that span multiple packets. Can't track connection state.
Stateful inspection	Tracks the state of active connections. Maintains a state table. Only allows packets that are part of an established, expected connection. Rejects "out of state" packets (e.g. a SYN-ACK without a prior SYN).	Much more secure than packet filtering. Understands TCP connections.	Higher resource usage. Can be fooled by application-layer attacks.
Proxy server	Sits between client and server. All requests go to the proxy first; the proxy makes the request on the client's behalf. The client's real IP is hidden.	Hides internal network structure. Can cache responses (faster). Can filter content at application layer.	Slower (extra hop). Proxy can become a bottleneck.

Symmetric and asymmetric encryption — and how HTTPS uses both▶

Symmetric encryption: one key used for both encryption and decryption. The same secret key is needed by both parties.

Fast — efficient algorithms (AES is symmetric)
Problem: the key distribution problem. How do you securely share the key with the other party? If you send it over an insecure channel, an eavesdropper gets it. If you need a secure channel to share the key, you already have what you need.

Asymmetric encryption: a mathematically linked key pair — public key and private key. What one encrypts, only the other can decrypt. The public key can be shared with anyone; the private key is kept secret.

Encrypting a message: sender encrypts with recipient's public key. Only the recipient's private key can decrypt it. Even the sender can't decrypt what they just sent.
Solves key distribution: the public key can be shared openly — there's no risk in anyone knowing it.
Slower than symmetric encryption — the mathematics (RSA, ECC) are computationally expensive.

How HTTPS uses both — TLS handshake

Server sends its digital certificate containing its public key (and proof it's genuine)
Client verifies the certificate against the CA's public key
Client generates a random session key (symmetric)
Client encrypts the session key using the server's public key and sends it
Server decrypts with its private key — now both have the session key
All further communication encrypted with the session key (symmetric, fast)

This hybrid approach gets the best of both worlds: asymmetric for secure key exchange, symmetric for fast bulk encryption.

Digital certificates and digital signatures▶

A digital certificate is an electronic document that binds a public key to an identity (a website's domain name). It's issued and signed by a trusted Certificate Authority (CA) (e.g. DigiCert, Let's Encrypt). When you visit https://bbc.co.uk, your browser checks: is this certificate signed by a CA I trust? If yes, the public key in the certificate genuinely belongs to bbc.co.uk. Prevents man-in-the-middle attacks.

A digital signature proves a message came from who it claims and hasn't been tampered with:

Sender takes the message and computes its hash (a fixed-length "fingerprint" of the content)
Sender encrypts the hash with their own private key — this is the digital signature
Sender sends: original message + digital signature
Receiver decrypts the signature using sender's public key → recovers the hash
Receiver independently computes the hash of the received message
If the two hashes match → message is authentic and unmodified

Key reversal — this is always examined

Encryption uses the recipient's PUBLIC key to encrypt, and PRIVATE key to decrypt.
Digital signatures use the sender's PRIVATE key to sign, and PUBLIC key to verify.
These are opposite uses of the key pair. Exams frequently test this distinction.

Malware — viruses, worms, trojans▶

	Virus	Worm	Trojan
Self-replicating?	Yes — attaches to other files	Yes — independently	No — single file
How it spreads	Infected files shared between users. Requires human action to execute infected file.	Exploits network vulnerabilities. Spreads automatically without user action.	User is tricked into installing it. Disguised as legitimate software.
Requires host file?	Yes — parasitic	No — standalone	No — standalone
Example attack	Macro virus in Word documents	WannaCry (2017) — spread via unpatched Windows SMB	Remote access trojan (RAT) — gives attacker control of your machine

Addressing malware — the spec requires knowing three approaches:

Improved code quality: fewer bugs = fewer vulnerabilities to exploit. Code reviews, testing, formal verification.
Monitoring: intrusion detection systems, log analysis, anomaly detection to spot attack patterns.
Protection: antivirus software, firewalls, sandboxing (run suspicious code in an isolated environment), regular security patches.

4.9.4 TCP/IP stack

4 — Application layer
HTTP · HTTPS · FTP · SMTP · POP3 · SSH · DNS
This is the layer your software directly interacts with. Protocols here define how specific services work — how web pages are requested, how emails are sent, how files are transferred. Data at this layer is called a message. SSH creates an encrypted TCP connection and lets you run commands on a remote machine or tunnel other protocols through the encryption.
3 — Transport layer
TCP · UDP · Ports · Sockets
Provides end-to-end communication between specific processes on different hosts. Uses port numbers to direct data to the right application. A socket = IP address + port number (the specific endpoint). TCP provides reliable, ordered, error-checked delivery via connection establishment (3-way handshake) and acknowledgements. UDP is connectionless and faster — no guarantees. Data = segment (TCP) or datagram (UDP).
2 — Network (Internet) layer
IP (IPv4 · IPv6) · NAT · Subnet masks · DHCP
Handles logical addressing and routing between networks. IP addresses packets and routers use routing tables to forward them. IPv4: 32-bit (192.168.1.1). IPv6: 128-bit — introduced to solve IPv4 address exhaustion. Every IP address has a network identifier part and host identifier part. Subnet mask defines the split. NAT allows private-IP devices to share one public IP. DHCP automatically assigns IP addresses. Data = packet.
1 — Link (Network access) layer
Ethernet · Wi-Fi · MAC addresses · Port forwarding
Handles physical transmission within a single network segment. Uses MAC addresses (48-bit hardware addresses burned into NICs) to identify devices on the local segment. MAC addresses are only relevant on the local network — routers use IP addresses beyond that. Port forwarding: configuring a router to forward incoming requests on a specific port to a specific internal device. Data = frame.

Sockets, ports, and protocols in full detail▶

A socket is a specific communication endpoint, identified by a combination of IP address and port number: 151.101.0.81:443. Every TCP/UDP connection has a source socket and a destination socket. This allows multiple simultaneous connections to the same server — your browser can have many open connections to bbc.co.uk, each with a different client port.

Protocol	Port	Purpose	Key detail
HTTP	80	Transfer web pages	Unencrypted. A GET request fetches a resource; a POST sends data.
HTTPS	443	Secure web	HTTP over TLS (Transport Layer Security). Encrypts all data. Used via SSH: ssh client connects TCP to port 443, sends HTTP GET commands through encrypted tunnel.
FTP	21	File transfer	Anonymous (no credentials) or authenticated. Uses separate control and data connections. SFTP adds SSH encryption.
SMTP	25	Send email	Your email client sends to your mail server. Mail servers relay between each other. SSH tunnelling: SSH to port 25, send SMTP commands directly.
POP3	110	Retrieve email	Downloads email to local device, usually deletes from server. SSH to port 110, use POP3 commands (USER, PASS, RETR, DELE).
SSH	22	Secure remote access	Encrypted terminal access. Can log in remotely and run commands. Tunnels other protocols (HTTP, SMTP, POP3) through encryption.

Well-known ports: 0–1023. Reserved for standard services. Require elevated privileges to bind to (on Unix-like systems).

Ephemeral (client) ports: 1024–65535. Assigned temporarily to the client side of a connection. Your browser uses one of these when connecting to a server — e.g. your connection to bbc.co.uk:443 uses bbc.co.uk:443 as destination, and something like 192.168.1.5:52891 as source.

TCP handshake, IPv4/v6, subnet masks, DHCP, NAT, port forwarding▶

TCP 3-way handshake establishes a connection before data is sent:

Client

SYN (synchronise — "I want to connect")

SYN-ACK (synchronise-acknowledge — "OK, connection accepted")

ACK (acknowledge — "Got it, let's go")

Server

IPv4: 32-bit address, written as 4 decimal octets (0–255): 192.168.1.100. Total of 2³² ≈ 4.3 billion unique addresses — exhausted. Every IP address has a network identifier (which network) and host identifier (which device on that network).

IPv6: 128-bit address, written in hexadecimal groups: 2001:0db8:85a3::8a2e:0370:7334. 2¹²⁸ ≈ 3.4×10³⁸ addresses — effectively unlimited. Introduced specifically to solve IPv4 exhaustion.

Subnet mask: defines which part of an IP address is the network ID and which is the host ID. 255.255.255.0 means the first 24 bits are the network, the last 8 bits are the host. All devices sharing the same network portion can communicate directly; others must go via a router.

DHCP (Dynamic Host Configuration Protocol): automatically assigns IP addresses, subnet masks, gateway addresses, and DNS server addresses to devices joining the network. Without DHCP, you'd manually configure every device. The DHCP server maintains a pool of available addresses and leases them for a fixed time.

NAT (Network Address Translation): private IP addresses (e.g. 192.168.x.x) are not routable on the internet — they're used internally only. Your router uses NAT to translate between your private internal addresses and your single public IP address. This also provides a security benefit — external devices can't directly address internal machines.

Port forwarding: when an external request arrives at your router on a specific port, port forwarding tells the router to pass it to a specific internal device. Example: to host a game server on your laptop (192.168.1.50) on port 27015, configure port forwarding: "incoming UDP:27015 → 192.168.1.50:27015". External players connect to your public IP:27015; your router forwards it to your laptop.

4.9.4.10 Application models

WebSocket, CRUD, REST, JSON vs XML, thin vs thick client▶

WebSocket: standard HTTP is request-response — the client always initiates. The server can't push data to the client unprompted. WebSocket upgrades an HTTP connection to a persistent, full-duplex TCP connection — both sides can send data at any time. Used for live chat, live stock prices, collaborative editing, online games, real-time dashboards.

Web server: serves web pages as text (HTML, CSS, JavaScript). A client requests a URL; the server responds with the file contents.

Web browser: retrieves web pages (via HTTP/HTTPS), interprets HTML/CSS/JavaScript, fetches additional resources (images, scripts), and renders the result visually.

CRUD applications: most web apps perform four basic operations on a database:

CRUD	HTTP method	SQL equivalent	Example
Create	POST	INSERT	Submit a new blog post
Retrieve	GET	SELECT	Load a list of products
Update	PUT / PATCH	UPDATE	Edit your profile details
Delete	DELETE	DELETE	Remove a comment

REST (Representational State Transfer): an architectural style for building web APIs. The browser's JavaScript calls the REST API on the server using HTTP methods. The server performs the database operation and returns data as JSON or XML.

JSON vs XML comparison:

	JSON	XML
Readability	Easier — concise syntax	Verbose — lots of opening and closing tags
File size	More compact	Larger due to tag repetition
Parsing	Faster — natively supported in JavaScript (JSON.parse())	Slower — requires a full XML parser
Example	`{"name":"Alice","age":17}`	`<person><name>Alice</name><age>17</age></person>`
Usage	Modern REST APIs, configuration files	Legacy enterprise systems, some configuration formats

Thin vs thick client:

	Thin client	Thick client
Processing	On the server — client just displays the interface	On the local device
Local storage	Minimal or none	Significant — data stored locally
Network dependency	High — barely works without connection	Low — can work fully offline
Hardware cost	Cheap — client needs minimal CPU/RAM	Expensive — powerful device needed
Maintenance	Simple — update server once, all clients get the update	Complex — must update every individual device
Security	Easier — data never leaves the server	Harder — data on potentially insecure devices
Examples	Chromebook, Citrix workspace, web-only app in browser	Desktop software (Photoshop, Word), installed mobile app

4.10 — Paper 2

Fundamentals of Databases

Relational model · Entity-relationship · Normalisation · SQL · Transactions · ACID

4.10.1 Relational databases

A relational database organises data into tables (also called relations). Each table represents one type of entity (a real-world object or concept about which data is stored). Tables are linked to each other through keys, avoiding the need to repeat data.

This is the dominant database model because it eliminates data redundancy, prevents inconsistencies, and allows complex queries across multiple related tables.

Key concepts: entity, attribute, primary key, foreign key▶

An entity is a real-world thing about which data is stored. Entities become tables. Examples: Student, Course, Order, Product.

An attribute is a property of an entity. Attributes become columns (fields) in the table. For a Student entity: StudentID, FirstName, LastName, DateOfBirth.

A primary key (PK) is an attribute (or combination of attributes) that uniquely identifies each row in a table. Rules: must be unique, must not be null (empty), must not change over time. Example: StudentID = 10042 uniquely identifies one student.

A foreign key (FK) is an attribute in one table that refers to the primary key of another table. It creates a link (relationship) between the two tables.

Example — Student and Enrolment tables

Student table:
StudentID (PK) | FirstName | LastName | DateOfBirth
10042          | Alice     | Smith    | 2007-09-14
10043          | Bob       | Jones    | 2007-11-22

Enrolment table:
EnrolmentID (PK) | StudentID (FK) | CourseCode (FK) | Year
E001             | 10042          | CS01            | 2024
E002             | 10042          | MA01            | 2024
E003             | 10043          | CS01            | 2024

StudentID in Enrolment is a foreign key — it refers back to StudentID in Student. This links each enrolment to exactly one student without duplicating the student's name and date of birth.

Referential integrity: the rule that every foreign key value must match an existing primary key value in the referenced table. You cannot add an enrolment for StudentID 99999 if no student with that ID exists. Referential integrity prevents "orphaned" records — data that refers to something that doesn't exist.

Composite key: a primary key made from two or more attributes, neither of which is unique on its own, but together they uniquely identify a row. Example: in a table recording which students are in which classes, (StudentID, ClassID) together form a composite key — one student can be in many classes, one class has many students, but no student appears in the same class twice.

4.10.2 Entity-relationship (E-R) diagrams

Drawing and interpreting E-R diagrams▶

An E-R diagram shows entities as rectangles, attributes as ovals (or listed inside the rectangle), and relationships as diamonds or labelled lines. The key feature is the multiplicity — the cardinality of each relationship:

Relationship type	Notation	Meaning	Example
One-to-one (1:1)	── 1 : 1 ──	One entity relates to exactly one of the other	One person has one passport
One-to-many (1:M)	── 1 : M ──	One entity relates to many of the other	One teacher teaches many students
Many-to-many (M:N)	── M : N ──	Many on each side	Students enrol on many courses; each course has many students

Many-to-many relationships need a link table

A many-to-many relationship cannot be directly implemented in a relational database. It must be resolved into two one-to-many relationships using an intermediate (link/junction) table. The Enrolment table above is the link table between Student and Course.

4.10.3 Normalisation

Normalisation is the process of structuring a relational database to reduce data redundancy and eliminate update, insert, and delete anomalies. A database that isn't normalised can develop inconsistencies: if a student's address is stored in 10 rows, updating it in 9 places and missing one leaves the database in an inconsistent state.

1NF, 2NF, and 3NF — step by step▶

First Normal Form (1NF)

A table is in 1NF if:

Every cell contains a single, atomic (indivisible) value — no lists of values in one cell
There are no repeating groups (no multiple columns for the same type of data, e.g. Phone1, Phone2, Phone3)
All rows are uniquely identifiable (there is a primary key)

Not in 1NF → 1NF

NOT 1NF: OrderID | CustomerName | Products
         001     | Alice        | "Pen, Pencil, Ruler"   ← multiple values in one cell

1NF:     OrderID | CustomerName | Product
         001     | Alice        | Pen
         001     | Alice        | Pencil
         001     | Alice        | Ruler

Second Normal Form (2NF)

A table is in 2NF if it is in 1NF AND every non-key attribute is fully functionally dependent on the whole primary key. Partial dependencies (where an attribute depends on only part of a composite key) must be removed.

2NF is only relevant when the primary key is composite — if the PK is a single attribute, and the table is in 1NF, it is automatically in 2NF.

Partial dependency example

OrderLine(OrderID, ProductID, ProductName, Quantity)
PK = (OrderID, ProductID) — composite

ProductName depends only on ProductID — not on OrderID.
This is a PARTIAL DEPENDENCY → violates 2NF.

Fix — split into:
OrderLine(OrderID, ProductID, Quantity)    ← PK = (OrderID, ProductID)
Product(ProductID, ProductName)            ← PK = ProductID

Third Normal Form (3NF)

A table is in 3NF if it is in 2NF AND there are no transitive dependencies — where a non-key attribute depends on another non-key attribute (rather than directly on the key).

Transitive dependency example

Student(StudentID, CourseID, CourseName, LecturerName)
PK = StudentID

CourseName → depends on CourseID (not on StudentID directly)
LecturerName → depends on CourseID (transitive dependency)

Fix — split into:
Student(StudentID, CourseID)         ← PK = StudentID
Course(CourseID, CourseName, LecturerName)  ← PK = CourseID

Normalisation summary

1NF: atomic values, no repeating groups, has a PK
2NF: 1NF + no partial dependencies (non-key attributes depend on the full key)
3NF: 2NF + no transitive dependencies (non-key attributes don't depend on other non-key attributes)

4.10.4 SQL

Structured Query Language (SQL) is the standard language for interacting with relational databases. Know four types of statement:

SELECT queries — complete guide▶

-- Basic SELECT
SELECT column1, column2 FROM table WHERE condition;

-- Select all columns
SELECT * FROM Student WHERE LastName = 'Smith';

-- Comparison operators: =, <>, <, >, <=, >=
SELECT * FROM Student WHERE Age >= 16 AND Age <= 18;

-- LIKE for pattern matching: % = any sequence, _ = any single char
SELECT * FROM Student WHERE LastName LIKE 'S%';   -- starts with S
SELECT * FROM Student WHERE Email LIKE '%@school.ac.uk';

-- ORDER BY
SELECT FirstName, LastName FROM Student ORDER BY LastName ASC;
SELECT * FROM Product ORDER BY Price DESC;

-- Wildcard SELECT with condition
SELECT StudentID, FirstName FROM Student
WHERE DateOfBirth BETWEEN '2006-09-01' AND '2007-08-31'
ORDER BY LastName;

JOIN — combining tables

An INNER JOIN returns only rows where there is a matching value in both tables:

SELECT Student.FirstName, Student.LastName, Course.CourseName
FROM Student
INNER JOIN Enrolment ON Student.StudentID = Enrolment.StudentID
INNER JOIN Course ON Enrolment.CourseCode = Course.CourseCode
WHERE Course.CourseName = 'Computer Science'
ORDER BY Student.LastName;

This connects three tables. The ON clause specifies which columns to match. Rows with no match in the other table are excluded.

INSERT, UPDATE, DELETE▶

-- INSERT a new row
INSERT INTO Student (StudentID, FirstName, LastName, DateOfBirth)
VALUES (10044, 'Charlie', 'Brown', '2007-03-05');

-- UPDATE existing rows — ALWAYS use WHERE or you update every row!
UPDATE Student
SET Email = '[email protected]'
WHERE StudentID = 10044;

-- DELETE rows — ALWAYS use WHERE or you delete everything!
DELETE FROM Enrolment
WHERE StudentID = 10044;

Critical SQL safety rule

Never write UPDATE table SET ... or DELETE FROM table without a WHERE clause. Without WHERE, the operation applies to every single row in the table — potentially destroying all your data. Always write the WHERE clause first.

4.10.5 Transactions and ACID

ACID properties explained▶

A transaction is a sequence of database operations that must be treated as a single, indivisible unit. Example: a bank transfer — debit one account, credit another. Both must happen or neither must happen.

Property	Meaning	What happens if violated
Atomicity	A transaction completes fully or not at all. If any step fails, all previous steps in the transaction are rolled back (undone).	The bank deducts money from account A but the system crashes before crediting account B. Atomicity: both operations roll back — no money lost.
Consistency	A transaction brings the database from one valid state to another. All integrity constraints (data types, foreign keys, check constraints) must be satisfied after the transaction.	A transaction sets Age = -5. Consistency rejects this — age must be positive. The transaction rolls back.
Isolation	Concurrent transactions execute as if they were sequential — they don't interfere with each other. Each transaction sees a consistent snapshot of the data.	Two users simultaneously buy the last item in stock. Without isolation, both could succeed — overselling the item. Isolation prevents this.
Durability	Once a transaction commits, its changes are permanent — even if the system crashes immediately after. Data is written to non-volatile storage before confirming the commit.	User pays for an order. System crashes 1ms later. Without durability, the payment record is lost. With durability, it's written to disk and survives the crash.

Record locking

To achieve isolation in a multi-user database, the DBMS uses record locking: when a transaction reads or writes a record, it places a lock on it. Other transactions that need the same record must wait. This prevents two transactions from simultaneously modifying the same data. The risk is deadlock: Transaction A locks record 1 and waits for record 2; Transaction B locks record 2 and waits for record 1. Neither can proceed. DBMS systems detect deadlocks and abort one transaction to break the cycle.

4.10.6 Database Management Systems (DBMS)

A DBMS is the software layer between the database and the applications/users that interact with it. It provides:

Data storageManages how data is physically stored and retrieved from disk. Handles indexing for fast lookup.

Query processingAccepts SQL queries, optimises the execution plan, and returns results.

SecurityUser authentication and authorisation — who can read, write, or delete which data. Role-based access control.

Concurrency controlManages simultaneous access by multiple users — transaction isolation, record locking, deadlock detection.

Backup and recoveryTransaction logs allow recovery to a known-good state after failure. Point-in-time recovery.

Data integrityEnforces constraints: primary keys, foreign keys, data types, check constraints, NOT NULL.

Examples of DBMS: MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server, SQLite.

4.11 — Paper 2

Big Data

The 3Vs · Structured vs unstructured · Batch processing · Real-time · Distributed systems · MapReduce · Machine learning

What is Big Data?

Traditional relational databases and processing tools cannot handle some modern data because of its sheer scale, speed of generation, or variety of form. The term Big Data refers to datasets with characteristics — the "3 Vs" — that exceed the capacity of traditional database systems.

The 3 Vs	Definition	Real-world example
Volume	The sheer quantity of data is too large to store or process on a single machine. Think petabytes (10¹⁵ bytes) and beyond.	Facebook generates ~4 petabytes of data per day. Every GPS ping, like, and photo upload is captured.
Velocity	Data arrives so fast that it cannot be stored first and processed later — it must be processed in real-time or near-real-time as it streams in.	Stock exchange: millions of trades per second. Fraud detection systems must evaluate each transaction in milliseconds before it completes.
Variety	Data comes in different forms — structured (database rows), semi-structured (JSON, XML), and unstructured (images, video, audio, natural language text). Traditional relational databases only handle structured data well.	A hospital's data includes: structured patient records in a RDBMS, X-ray images, doctor's voice notes, genetic sequence files.

4.11.1 Structured vs unstructured data

Types of data and their challenges▶

Structured data: has a predefined schema (format) — rows and columns, fixed data types. Easily stored in relational databases. Easy to query with SQL. Example: a database table of customer orders.

Semi-structured data: has some structure (tags or keys) but doesn't fit neatly into tables. Examples: JSON files, XML documents, emails (headers are structured, body is not).

Unstructured data: no predefined format. Cannot be stored meaningfully in a relational database without losing information. Examples: video files, audio recordings, social media posts, images, natural language text, sensor data streams.

Approximately 80–90% of the world's data is unstructured. Processing it requires different tools — machine learning, natural language processing, computer vision — rather than SQL.

4.11.2 Processing approaches

Batch processing vs real-time (stream) processing▶

	Batch processing	Real-time (stream) processing
Timing	Data collected, then processed all at once at a scheduled time	Data processed as it arrives, continuously
Latency	High — results available after the batch completes (minutes to hours)	Very low — results available within milliseconds
Resource use	Can be run during off-peak hours, using spare capacity	Must maintain constant processing capacity
Complexity	Simpler — process a static dataset once	Harder — must handle out-of-order data, failures mid-stream
Use cases	End-of-day bank reconciliation, payroll processing, generating overnight reports, training ML models	Fraud detection, GPS navigation, live dashboards, IoT sensor monitoring, recommendation engines

4.11.3 Distributed processing — why it's necessary

When a dataset is too large to fit on one machine or too slow to process on one CPU, the solution is distributed processing: split the work across many machines (a cluster) that work in parallel and then combine the results.

A modern big data cluster might contain thousands of commodity (cheap) servers. The data is split across all of them. Each machine processes its local portion of the data simultaneously. This gives near-linear scaling: double the machines, roughly halve the processing time.

4.11.4 MapReduce

MapReduce — how it works with a worked example▶

MapReduce is a programming model for processing large datasets across a distributed cluster. It was popularised by Google and is the foundation of Apache Hadoop. It works in two phases:

Map phase: The input data is split into chunks. Each chunk is processed by a separate mapper (running on a different machine) which applies a map function to each item, producing (key, value) pairs as output.

Reduce phase: All the (key, value) pairs with the same key are grouped together (shuffle and sort) and passed to a reducer, which combines them to produce the final result.

Worked example: counting word frequencies in a huge text file

Input (split across machines):
  Machine 1: "the cat sat on the mat"
  Machine 2: "the cat sat on a hat"

MAP phase — each machine emits (word, 1) for each word:
  Machine 1: (the,1)(cat,1)(sat,1)(on,1)(the,1)(mat,1)
  Machine 2: (the,1)(cat,1)(sat,1)(on,1)(a,1)(hat,1)

SHUFFLE — group by key:
  "the": [1,1,1]   "cat": [1,1]   "sat": [1,1]
  "on":  [1,1]     "mat": [1]     "a":   [1]   "hat": [1]

REDUCE phase — sum each group:
  "the": 3   "cat": 2   "sat": 2   "on": 2
  "mat": 1   "a": 1     "hat": 1

The key insight: the Map phase runs entirely in parallel across all machines. The Reduce phase also runs in parallel (one reducer per unique key group). The framework handles distributing data, coordinating machines, and recovering from individual machine failures — the programmer only writes the map and reduce functions.

Why MapReduce suits Big Data

MapReduce processes data where it lives — data doesn't move to a central processor, the computation moves to the data. It scales linearly: doubling the cluster roughly halves processing time. It handles machine failures automatically: if a machine dies mid-job, its chunk of work is simply reassigned to another machine.

4.11.5 Machine learning and Big Data

Machine learning — overview for A-level▶

Machine learning is a technique where a system learns patterns from data rather than being explicitly programmed with rules. This is central to Big Data because the volume and complexity of the data makes manually writing rules impossible.

Supervised learning: the model is trained on labelled examples (input + correct output). It learns the mapping from input to output. Examples: spam filter (email + spam/not-spam label), image classifier (image + label), fraud detection (transaction + fraud/not-fraud).

Unsupervised learning: the model finds patterns in unlabelled data. Examples: customer segmentation (grouping customers with similar purchasing behaviour without being told how many groups to find), anomaly detection.

Training, validation, and test sets: a dataset is split into three parts. The model is trained on the training set, tuned using the validation set, and its final accuracy measured on the test set (which it has never seen before). This prevents overfitting — a model that memorises the training data but fails on new examples.

Why Big Data enables ML

Machine learning models improve with more data. A spam filter trained on 100 emails is unreliable. One trained on 100 billion emails can detect subtle patterns humans could never articulate. Big Data infrastructure (distributed storage + MapReduce) makes training on massive datasets feasible.

4.11.6 Graph databases

Some data is fundamentally about relationships — social networks (who knows whom), recommendation engines (who bought what), knowledge graphs. Relational databases struggle with highly connected data because joining many tables is slow. Graph databases store data as:

NodesEntities — a person, a product, a city. Each node has properties (key-value pairs).

EdgesRelationships between nodes — FOLLOWS, PURCHASED, LIVES_IN. Edges have direction and can also have properties (e.g. weight, timestamp).

Graph databases excel at queries like "find all friends of friends who have purchased product X" — a query that would require multiple expensive JOINs in SQL but is trivial to traverse in a graph. Example: Neo4j. Used by: LinkedIn (people you may know), Google (Knowledge Graph), financial fraud detection (tracing transaction chains).

4.12 — Paper 1

Functional Programming

Pure functions · Higher-order functions · Map · Filter · Reduce · Closures · Lists · Haskell

4.12.1 What is functional programming?

Functional programming (FP) is a programming paradigm based on the concept of mathematical functions. Rather than writing sequences of instructions that change program state, FP is about defining functions that transform values. Programs are built by composing functions together.

AQA uses Haskell as the reference FP language. You don't need to write complete Haskell programs, but you must understand and trace FP concepts as they appear in pseudocode and Haskell notation.

4.12.2 Core principles

Pure functions and referential transparency▶

A pure function always returns the same output for the same input, and has no side effects — it does not modify any external state, write to disk, print to screen, or change any variable outside itself.

Pure function	Impure function (avoid in FP)
`double x = x * 2` — always returns 2x. No side effects.	`def addToList(x): globalList.append(x)` — modifies external state.
`add x y = x + y` — output depends only on inputs.	`def getInput(): return input()` — result varies; depends on user.

Referential transparency: because pure functions always produce the same output for the same input, a function call can be replaced by its result anywhere in the program without changing the program's behaviour. double 5 can always be replaced by 10. This makes programs easier to reason about and test.

Why pure functions matter

Programs with only pure functions are: easier to test (no complex setup needed), easier to parallelise (no shared state = no race conditions), easier to reason about (no hidden dependencies), and more predictable. This is why functional languages are used in concurrent systems like Erlang (WhatsApp's backend) and financial systems.

First-class and higher-order functions▶

In functional programming, functions are first-class values — they can be:

Passed as arguments to other functions
Returned as results from other functions
Stored in variables or data structures
Defined anonymously without a name (lambda expressions)

A higher-order function is one that takes a function as an argument, or returns a function as its result (or both).

Higher-order function example

-- apply takes a function f and a value x, and applies f to x
apply f x = f x

apply double 5     → 10
apply (add 3) 7   → 10

-- A function can be returned:
makeAdder n = \x -> x + n    -- returns a new function
add3 = makeAdder 3
add3 7  → 10

4.12.3 The three essential higher-order functions

Map, Filter, and Fold/Reduce — fully explained with examples▶

Map

map applies a function to every element of a list and returns a new list of the same length containing the transformed values.

-- Haskell:
map double [1,2,3,4,5]  →  [2,4,6,8,10]
map (*3)   [1,2,3,4,5]  →  [3,6,9,12,15]
map even   [1,2,3,4]    →  [False,True,False,True]

-- Python equivalent:
list(map(lambda x: x*2, [1,2,3,4,5]))  →  [2,4,6,8,10]

Map never modifies the original list

Map returns a new list. The original list is unchanged. This is a core property of FP — immutable data.

Filter

filter takes a predicate (a function that returns True or False) and a list, and returns a new list containing only the elements for which the predicate returns True.

-- Haskell:
filter even [1,2,3,4,5,6]   →  [2,4,6]
filter (>3) [1,2,3,4,5,6]   →  [4,5,6]
filter odd  [1,2,3,4,5]     →  [1,3,5]

-- Python equivalent:
list(filter(lambda x: x % 2 == 0, [1,2,3,4,5,6]))  →  [2,4,6]

Fold / Reduce

foldr (fold right) and foldl (fold left) combine all elements of a list into a single value using an accumulator function and a starting value.

-- foldr f start [a,b,c] = f a (f b (f c start))
foldr (+) 0 [1,2,3,4,5]   →  15   (sum: 1+2+3+4+5)
foldr (*) 1 [1,2,3,4,5]   →  120  (factorial: 5!)
foldr (:) [] [1,2,3]      →  [1,2,3]  (copies the list)

-- Python equivalent (reduce):
from functools import reduce
reduce(lambda acc, x: acc + x, [1,2,3,4,5], 0)  →  15

Tracing foldr (+) 0 [1,2,3]

foldr (+) 0 [1,2,3]
= 1 + (foldr (+) 0 [2,3])
= 1 + (2 + (foldr (+) 0 [3]))
= 1 + (2 + (3 + (foldr (+) 0 [])))
= 1 + (2 + (3 + 0))
= 1 + (2 + 3)
= 1 + 5
= 6

4.12.4 Lambda expressions (anonymous functions)

Lambda expressions in Haskell and Python▶

A lambda expression defines a function without giving it a name. Useful when you need a simple function in one place and don't want to define it separately.

-- Haskell lambda syntax: \parameter -> expression
\x -> x * 2         -- a function that doubles its input
\x y -> x + y       -- a function that adds two numbers

-- Usage:
map (\x -> x * 2) [1,2,3,4,5]   →  [2,4,6,8,10]
filter (\x -> x > 3) [1..6]     →  [4,5,6]

-- Python lambda syntax: lambda parameter: expression
double = lambda x: x * 2
add = lambda x, y: x + y
double(5)  →  10

4.12.5 Partial application and currying

Partial application and currying — the key ideas▶

Partial application: providing fewer arguments than a function expects, producing a new function that takes the remaining arguments.

-- add takes two arguments
add x y = x + y

-- Partially apply add with 3 → get a new function "add 3"
add3 = add 3       -- add3 is now a function: y -> 3 + y
add3 7   →  10
add3 20  →  23

-- This is extremely useful with map:
map (add 10) [1,2,3,4,5]   →  [11,12,13,14,15]

Currying: transforming a function that takes multiple arguments into a chain of functions that each take one argument. In Haskell, all functions are automatically curried — add x y = x + y is actually a function that takes x and returns a function that takes y.

Currying vs non-curried

-- Non-curried (tuple argument):
addPair (x, y) = x + y
addPair (3, 4)  →  7   -- must pass both arguments at once

-- Curried (automatic in Haskell):
add x y = x + y
add 3 4    →  7         -- pass both
add 3      →  function   -- pass one: partial application

4.12.6 Lists in functional programming

List operations — head, tail, cons, and recursion over lists▶

In Haskell, lists are fundamental. A list is either empty [] or a head (first element) followed by a tail (the rest of the list, itself a list).

Operation	Haskell	Result
Head (first element)	`head [1,2,3,4,5]`	`1`
Tail (all except first)	`tail [1,2,3,4,5]`	`[2,3,4,5]`
Cons (prepend element)	`1 : [2,3,4,5]`	`[1,2,3,4,5]`
Null (is it empty?)	`null []`	`True`
Length	`length [1,2,3]`	`3`
Concatenate	`[1,2] ++ [3,4]`	`[1,2,3,4]`

List ranges: [1..5] → [1,2,3,4,5]. [2,4..10] → [2,4,6,8,10]. [5,4..1] → [5,4,3,2,1].

Recursive functions on lists: FP uses recursion instead of loops. The pattern is: base case (empty list) and recursive case (process head, recurse on tail).

-- Compute the sum of a list recursively:
mySum []     = 0                    -- base case: empty list sums to 0
mySum (x:xs) = x + mySum xs        -- x is head, xs is tail

mySum [1,2,3,4,5]
= 1 + mySum [2,3,4,5]
= 1 + 2 + mySum [3,4,5]
= 1 + 2 + 3 + mySum [4,5]
= 1 + 2 + 3 + 4 + mySum [5]
= 1 + 2 + 3 + 4 + 5 + mySum []
= 1 + 2 + 3 + 4 + 5 + 0
= 15

The pattern (x:xs) in a function definition is called pattern matching. It simultaneously matches the list structure (head:tail) and binds x to the head and xs to the tail.

Functional vs imperative — key contrasts for exam answers

When asked to compare functional and imperative programming:

Immutability: FP avoids changing variable values; imperative programs regularly mutate state
No loops: FP uses recursion; imperative uses for/while loops
Pure functions: FP functions have no side effects; imperative functions often modify external state
Easier to parallelise: no shared state means no race conditions — FP scales well to multi-core
Harder to learn: thinking recursively and avoiding state is unintuitive for beginners trained on imperative languages

4.13 — NEA & Paper 1

Systematic Approach to Problem Solving

Analysis · Design · Implementation · Testing · Evaluation

Where this appears

This topic underpins your NEA coursework project and appears in Paper 1. Questions ask you to describe phases, justify testing choices, or evaluate a solution. Know each phase's purpose, activities, and outputs.

The five phases

Phase 1 — Analysis▶

Purpose: Fully understand the problem before attempting to solve it. Many failed projects failed because the wrong problem was solved.

Activities: Identify the problem and its scope. Gather requirements from stakeholders through interviews, questionnaires, observation. Separate functional requirements (what the system must DO — "the system must allow users to log in") from non-functional requirements (how well it must do it — "the system must respond within 2 seconds"). Create a data model — an abstraction of the real-world objects involved.

Output: A requirements specification. Each requirement must be specific, measurable, and testable. This document becomes the success criteria evaluated against at the end.

Common mistake

Skipping analysis and jumping to coding. Requirements found late are expensive — a change at the analysis stage costs almost nothing; the same change after coding may require reworking weeks of work.

Phase 2 — Design▶

Purpose: Plan the solution before implementing it. Translate requirements into a concrete blueprint.

Activities:

Data structures: what data needs to be stored and how? Records, arrays, database tables?
Algorithms: design algorithms in pseudocode before coding them. Trace them manually to verify correctness.
Modular structure: use hierarchy charts to show how the program breaks into subroutines and how they relate. Each module has clearly defined inputs and outputs.
User interface design: wireframes and mockups of screens. Good UI design reduces user errors and training time.
Test plan: design tests at the design stage, not after — ensures requirements are actually testable.

Output: Design documentation — pseudocode algorithms, hierarchy charts, data dictionary (description of all data items), UI wireframes, test plan.

Phase 3 — Implementation▶

Purpose: Translate the design into working code.

Activities: Write code in the chosen language, following the design. Use meaningful identifier names. Add comments explaining non-obvious logic. Test incrementally — test each module as it's written rather than waiting until everything is done (unit testing). Use version control to track changes.

Good practice during implementation:

Meaningful variable/function names — calculateTotalCost() not calc()
Consistent indentation and formatting
Comment complex logic — explain why, not just what
Modular code — each subroutine does one thing well
Avoid magic numbers — use named constants

Output: Working, documented source code.

Phase 4 — Testing▶

Purpose: Verify the implementation is correct and robust — it produces the right output for all valid inputs, handles edge cases gracefully, and rejects invalid inputs.

Three categories of test data:

Category	Definition	Example (age input, valid range 0–120)
Normal (typical)	Valid data the program should accept and process correctly	25, 40, 67
Boundary (extreme)	Values at the very edge of valid ranges — including the limit itself and just outside	0 (minimum), 120 (maximum), -1 (just below), 121 (just above)
Erroneous (invalid)	Data of the wrong type or completely outside valid range that the program should reject	"twenty", -50, 999, "abc"

Types of testing:

Unit testing: test each subroutine/module in isolation
Integration testing: test how modules work together
System testing: test the complete system against all requirements
User acceptance testing (UAT): stakeholders confirm the system meets their needs
Alpha/beta testing: internal vs external user testing before full release

Output: Test evidence — for each test: input data, expected output, actual output, pass/fail, corrective action taken if failed.

Boundary data — the most commonly lost marks

For a range of 1–100: boundary values are 1 (min), 100 (max), 0 (just below min), 101 (just above max). You must test BOTH sides of each boundary. Many students only test the valid boundary, not the invalid one just beyond it.

Phase 5 — Evaluation▶

Purpose: Assess the finished system against the original requirements. Identify what works well, what doesn't, and what could be improved.

Evaluation criteria:

CorrectnessDoes the system correctly solve the problem? Does it meet every functional requirement? Evidence from testing demonstrates this.

EfficiencyDoes the system perform well? Is it fast enough for real use? Does it use memory and storage appropriately?

UsabilityIs the interface intuitive? Can the target user use it without extensive training? Does it follow UI conventions?

MaintainabilityIs the code well-structured, documented, and readable? Could another developer understand and modify it?

RobustnessDoes the system handle unexpected inputs gracefully? Does it fail safely rather than crashing?

Output: Evaluation report — for each original requirement, state whether it was met and provide evidence. Identify limitations and suggest future improvements.

Reference

Full Glossary

Every key term across all 13 topics — searchable

Reference

All Exam Traps

Common mistakes and misconceptions across all topics

⚠

[4.1] Local variable lifetime: Local variables are created when a subroutine is called and destroyed when it returns. They don't retain their value between calls. If you need persistence, use a global variable or pass the value as a parameter/return it.

⚠

[4.1] REPEAT-UNTIL vs WHILE: REPEAT-UNTIL always executes at least once — condition checked at the END. WHILE may execute zero times — condition checked at the START. This is a commonly tested distinction.

⚠

[4.1] Procedure vs function: A procedure doesn't return a value. A function does. Don't use them interchangeably in exam answers — marks are lost for incorrect terminology.

⚠

[4.1] DIV vs MOD: DIV gives the whole number quotient (17 DIV 5 = 3). MOD gives the remainder (17 MOD 5 = 2). Verify: 3×5=15, 17-15=2 ✓. MOD 2 = 0 means even; MOD 2 = 1 means odd.

⚠

[4.2] Stack vs queue operations: Stack: push/pop (top only — LIFO). Queue: enqueue to rear, dequeue from front (FIFO). Using the wrong term loses marks. A stack cannot "enqueue".

⚠

[4.2] Linked list — no random access: Unlike arrays, you cannot jump directly to element n in a linked list. You must traverse from the head. This is O(n) not O(1).

⚠

[4.2] BST in-order traversal: In-order (Left→Root→Right) visits a BST in ascending sorted order. This is the most useful traversal and the one most tested. Pre-order: Root→Left→Right. Post-order: Left→Right→Root.

⚠

[4.3] Binary search requires sorted data: Binary search ONLY works on a sorted list. Applying it to unsorted data produces wrong results. If the list isn't sorted, sort it first (at the cost of the sort's time complexity).

⚠

[4.3] Quick sort worst case: O(n²) when the pivot is always the smallest or largest — happens on already-sorted data with a naive first-element pivot. Average case is O(n log n). Always distinguish average from worst case.

⚠

[4.3] Big-O is worst case by default: When asked for the complexity of an algorithm, give the worst case unless asked for best/average. Binary search is O(log n) worst case but O(1) best case.

⚠

[4.4] Halting problem — undecidable not just hard: Don't say the halting problem is "very difficult to solve" or "hasn't been solved yet". It has been PROVEN to be impossible — no algorithm can exist that correctly solves it for all inputs. This is a fundamental result, not an engineering challenge.

⚠

[4.4] Turing machines are theoretical: Turing machines are a mathematical model, not a physical device. Their purpose is to formally define the limits of computation, not to describe how computers actually work.

⚠

[4.5] Two's complement range asymmetry: For 8-bit two's complement: −128 to +127. The negative range extends one further than the positive because zero is represented as a positive value. 8-bit unsigned: 0 to 255.

⚠

[4.5] Baud rate vs bit rate: Equal only when each signal change encodes exactly one bit. If using modulation (e.g. QAM), each signal change can encode multiple bits — bit rate will exceed baud rate. State the relationship, don't assume equality.

⚠

[4.5] Vernam cipher requires one-time use: The Vernam cipher has perfect security ONLY if the key is truly random, as long as the message, and NEVER reused. Reusing the key makes it breakable. This is a common misconception exam questions exploit.

⚠

[4.5] Parity bits detect but don't correct: A parity bit detects an odd number of bit errors (1, 3, 5…) but cannot locate which bit is wrong, so it cannot correct the error. Majority voting (3 copies per bit) CAN correct single-bit errors.

⚠

[4.6] Assembly language ≠ machine code: Assembly uses human-readable mnemonics (ADD, MOV). Machine code is binary. An assembler translates between them. They have a one-to-one relationship but they are NOT the same thing.

⚠

[4.6] NAND and NOR truth tables: NAND(1,1)=0, all others=1. NOR(0,0)=1, all others=0. These are the inverse of AND and OR. NAND and NOR are universal — any circuit can be built from only NANDs or only NORs.

⚠

[4.7] Program counter increments during fetch: The PC is incremented DURING the fetch stage, before decode and execute. After fetching an instruction, PC already points to the next one. This is how sequential execution works and how the CPU pipelines.

⚠

[4.7] Immediate vs direct addressing: Immediate (#): the operand IS the value — LDA #5 loads the number 5. Direct: the operand is an ADDRESS — LDA 100 loads whatever is stored at address 100. Confusing these is a very common error.

⚠

[4.7] Von Neumann bottleneck: Occurs because data and instructions share the same bus in von Neumann architecture. The CPU can't simultaneously fetch an instruction AND read/write data — they compete. Harvard architecture solves this with separate buses.

⚠

[4.9] Hub vs switch: A hub broadcasts every frame to ALL ports — no intelligence. A switch reads the destination MAC address and forwards only to the correct port. Saying a switch broadcasts will lose marks.

⚠

[4.9] Digital signature key direction: Encryption: public key encrypts, private key decrypts. Digital signatures: PRIVATE key signs, PUBLIC key verifies. This is the reverse direction. Exams test this distinction directly and regularly.

⚠

[4.9] Virus vs worm vs trojan: Virus: attaches to a file, needs user action to spread. Worm: self-replicates across network with no user action. Trojan: disguised as legitimate software, doesn't replicate. These are distinct and the differences matter.

⚠

[4.9] CSMA/CA not CSMA/CD: CSMA/CA (Collision Avoidance) is for wireless Wi-Fi. CSMA/CD (Collision Detection) is for wired Ethernet. AQA A-level only requires CSMA/CA. Don't write about CD unless asked.

⚠

[4.9] Ring topology not on spec: AQA A-level only requires physical star and logical bus. Ring, mesh, MAN, and SAN are not required. Don't write about ring topology in exam answers.

⚠

[4.9] NAT vs DHCP — completely different: DHCP automatically assigns IP addresses to devices joining a network. NAT translates between private and public IP addresses at the router. They solve different problems entirely.

⚠

[4.10] DELETE/UPDATE without WHERE: DELETE FROM table; deletes EVERY row. UPDATE table SET x=1; updates EVERY row. Always include a WHERE clause. This is one of the most destructive SQL mistakes possible.

⚠

[4.10] 2NF requires composite key: Partial dependencies (the basis of 2NF) can only exist when the primary key is composite. If the PK is a single attribute, a table in 1NF is automatically in 2NF. 3NF addresses transitive dependencies (non-key → non-key).

⚠

[4.12] Functional programming — no side effects: Pure functions don't modify external state. This is exactly why functional programming suits parallel processing — no shared mutable state means no race conditions. Always connect these two ideas in exam answers.

⚠

[4.13] Boundary testing — both sides: For valid range 1–100, you must test: 1 (valid minimum), 100 (valid maximum), 0 (just below — invalid), 101 (just above — invalid). Many students only test the valid boundary values. Test BOTH sides of EACH boundary.