Raspberry PI Bare Metal Vol 4 : Timer #
In my previous blog post I created bare metal code for SPI communication. In this post, I will continue the journey by implementing Timer functionality on the Raspberry Pi using Timer peripheral.
System Timer #
The Raspberry Pi BCM2711 SoC includes a System Timer that runs at 1MHz, providing microsecond resolution. This timer is independent of the ARM core and continues running regardless of the processor state. Described on Chapter 10 of the BCM2711 ARM Peripherals Manual.
The System Timer provides a 64-bit free-running counter and 4 compare registers for generating interrupts with microsecond resolution timing
Timer Registers #
The System Timer base address is 0x3F003000 (for Raspberry Pi Zero 2W) and all offsets are from this base address.
| Offset | Name | Description |
|---|---|---|
| 0x00 | CS | System Timer Control/Status |
| 0x04 | CLO | System Timer Counter Lower 32 bits |
| 0x08 | CHI | System Timer Counter Upper 32 bits |
| 0x0C | C0 | System Timer Compare 0 |
| 0x10 | C1 | System Timer Compare 1 |
| 0x14 | C2 | System Timer Compare 2 |
| 0x18 | C3 | System Timer Compare 3 |
CS Register Bits #
The System Timer Control/Status register tracks and clears comparator match events for each timer channel. Match signals from this register are forwarded to the interrupt controller to generate interrupt requests.
The M0-3 fields indicate whether a free-running counter match has occurred. To acknowledge and clear a match detection flag along with its associated interrupt request line, write a one to the corresponding bit.
| Bits | Name | Description | Type | Reset |
|---|---|---|---|---|
| 31:4 | Reserved | - | - | - |
| 3 | M3 | System Timer Match 3 • 0 = No match • 1 = Match detected |
W1C | 0x0 |
| 2 | M2 | System Timer Match 2 • 0 = No match • 1 = Match detected |
W1C | 0x0 |
| 1 | M1 | System Timer Match 1 • 0 = No match • 1 = Match detected |
W1C | 0x0 |
| 0 | M0 | System Timer Match 0 • 0 = No match • 1 = Match detected |
W1C | 0x0 |
CLO Register #
System Timer Counter Lower bits: The system timer free-running counter lower register is a read-only register that returns the current value of the lower 32-bits of the free running counter.
| Bits | Field Name | Description | Type | Reset |
|---|---|---|---|---|
| 31:0 | CNT | Lower 32-bits of the free running counter value | RW | 0x0 |
CHI Register #
System Timer Counter Higher bits. Similar to CLOregister but for higher 32 bits.
| Bits | Field Name | Description | Type | Reset |
|---|---|---|---|---|
| 31:0 | CNT | Higher 32-bits of the free running counter value | RW | 0x0 |
C0, C1, C2, C3 Registers #
System Timer Compare: The system timer compare registers hold the compare value for each of the four timer channels. Whenever the lower 32-bits of the free-running counter matches one of the compare values, the corresponding bit in the system timer control/status register is set.
| Bits | Field Name | Description | Type | Reset |
|---|---|---|---|---|
| 31:0 | CMP | Compare value for match channel n | RW | 0x0 |
Delay based Timer implementation #
Below is the complete assembly code to set up the System Timer and create a delay function:
.equ MPIDR_AFFINITY_MASK, 0x3
.equ PERIPHERAL_BASE, 0x3F000000
.equ TIMER_CLO, (PERIPHERAL_BASE + 0x003004)
.equ GPFSEL2, (PERIPHERAL_BASE + 0x200008)
.equ GPSET0, (PERIPHERAL_BASE + 0x20001C)
.equ GPCLR0, (PERIPHERAL_BASE + 0x200028)
.section ".text.boot"
.global _start
_start:
mrs x1, mpidr_el1
and x1, x1, #MPIDR_AFFINITY_MASK
cbnz x1, park_core
ldr x1, =_start
mov sp, x1
bl gpio_init
main_loop:
bl gpio_on
ldr x0, =500000
bl delay_us
bl gpio_off
ldr x0, =500000
bl delay_us
b main_loop
gpio_init:
ldr x1, =GPFSEL2
ldr w2, [x1]
bic w2, w2, #(7 << 3)
orr w2, w2, #(1 << 3)
str w2, [x1]
ret
gpio_on:
ldr x1, =GPSET0
mov w2, #(1 << 21)
str w2, [x1]
ret
gpio_off:
ldr x1, =GPCLR0
mov w2, #(1 << 21)
str w2, [x1]
ret
delay_us:
stp x29, x30, [sp, #-16]!
stp x19, x20, [sp, #-16]!
mov x19, x0
ldr x1, =TIMER_CLO
ldr w20, [x1]
add w20, w20, w19
delay_loop:
ldr w0, [x1]
sub w2, w20, w0
cmp w2, #0
bgt delay_loop
ldp x19, x20, [sp], #16
ldp x29, x30, [sp], #16
ret
park_core:
wfe
b park_core
Code Breakdown #
Timer Reading #
I will not explain every function again as some of them are similar to previous posts like GPIO initialization and Toggling. I will focus on the timer-related functions.
Delay Function #
The delay_us function implements a microsecond delay by reading the current timer value from the CLO register and calculating a target time by adding the desired delay. It then enters a loop, continuously reading the timer until the current time reaches the target.
delay_us:
stp x29, x30, [sp, #-16]!
stp x19, x20, [sp, #-16]!
mov x19, x0 // Save delay value
ldr x1, =TIMER_CLO // Read current timer value (lower 32 bits)
ldr w20, [x1] // Get current timer count
add w20, w20, w19 // target = current + delay
delay_loop:
ldr w0, [x1] // Read current timer value
sub w2, w20, w0 // remaining = target - current
cmp w2, #0 // Check if remaining > 0
bgt delay_loop // Loop while remaining > 0
ldp x19, x20, [sp], #16
ldp x29, x30, [sp], #16
ret
Results Of Timer Implementation with Delay #
I connected the GPIO21 on Raspberry Pi Zero 2W. Using the timer-based delay instead of simple loop-based delays provides much more accurate timing. The GPIO toggles at exactly 1Hz (500ms on, 500ms off).
Using Timer Interrupts (Advanced) #
In the above implementation we have used a delay loop to check of the timer value until the desired time has elapsed. This is known as abusy-wait loop and while it is simple to implement, it is not the most efficient way to use the CPU as it wastes cycles checking the timer continuously hense the name “busy-wait”.
To really take full advantage of the timer, we should use timer interrupts. This way the CPU can perform other tasks or enter a low-power state while waiting for the timer to expire. When the timer reaches the compare value, it triggers an interrupt, allowing the CPU to handle the event without wasting cycles in a busy-wait loop.
Hypervisor Exception Level (EL2) and OS Exception Level (EL1) #
Now th implementation is bit complicated than previous one because we need to first set up. The reason is by default the GPU of Raspberry Pi puts the Arm core to Exception level 2 (EL2) Forum.
EL2 is typically used for hypervisors and virtualization, while EL1 is used for operating systems and bare-metal applications. If you want to know more about Exception Levels there is a good article.
As you can see from the discussion we need to route the interrupts from EL1 to EL2 by setting HCR_EL2 register.
By default, ARM64 routes all interrupts to EL1, even if the processor is currently running at EL2. This is by design, not a bug. Why ARM Does This? ARM designed the architecture with virtualization in mind: EL2 is the hypervisor level - It’s meant to manage virtual machines, not handle regular interrupts EL1 is the OS kernel level - This is where interrupts “belong” in a typical system, Default routing to EL1 allows a hypervisor to run guest OSes without intercepting every interrupt. More about it in this article AND this article.
I will take harder part and keep the easier one as home work for the readers.
Complete Timer Interrupt Implementation #
Here is an code for setting up timer interrupts on Raspberry Pi by jumping into EL1 from EL2:
.equ MPIDR_AFFINITY_MASK, 0x3
.equ PERIPHERAL_BASE, 0x3F000000
.equ TIMER_BASE, (PERIPHERAL_BASE + 0x003000)
.equ TIMER_CS, (TIMER_BASE + 0x00)
.equ TIMER_CLO, (TIMER_BASE + 0x04)
.equ TIMER_C1, (TIMER_BASE + 0x10)
.equ IRQ_BASE, (PERIPHERAL_BASE + 0xB000)
.equ IRQ_PENDING_1, (IRQ_BASE + 0x204)
.equ IRQ_ENABLE_1, (IRQ_BASE + 0x210)
.equ LOCAL_BASE, 0x40000000
.equ CORE0_TIMER_IRQCNTL, (LOCAL_BASE + 0x40)
.equ TIMER_IRQ_1, (1 << 1)
.equ GPFSEL2, (PERIPHERAL_BASE + 0x200008)
.equ GPSET0, (PERIPHERAL_BASE + 0x20001C)
.equ GPCLR0, (PERIPHERAL_BASE + 0x200028)
.equ TIMER_INTERVAL, 500000
.section ".text.boot"
.global _start
_start:
mrs x1, mpidr_el1
and x1, x1, #MPIDR_AFFINITY_MASK
cbnz x1, park_core
mrs x0, CurrentEL
and x0, x0, #0xC
cmp x0, #0x8
beq from_el2
b at_el1
from_el2:
ldr x1, =_start
mov sp, x1
mov x0, #(1 << 31)
msr hcr_el2, x0
//mov x0, #3 // Enable timer access when using ARM generic timer
//msr cnthctl_el2, x0
//msr cntvoff_el2, xzr
mov x0, #0x3C5
msr spsr_el2, x0
ldr x0, =at_el1
msr elr_el2, x0
eret
at_el1:
ldr x1, =_start
mov sp, x1
ldr x0, =vector_table
msr vbar_el1, x0
bl gpio_init
bl timer_init
msr daifclr, #2
main_loop:
wfi
b main_loop
.balign 0x800
vector_table:
b hang
.balign 0x80
b hang
.balign 0x80
b hang
.balign 0x80
b hang
.balign 0x80
b hang
.balign 0x80
b irq_handler
.balign 0x80
b hang
.balign 0x80
b hang
.balign 0x80
b hang
.balign 0x80
b hang
.balign 0x80
b hang
.balign 0x80
b hang
.balign 0x80
b hang
.balign 0x80
b hang
.balign 0x80
b hang
.balign 0x80
b hang
hang:
wfe
b hang
irq_handler:
stp x0, x1, [sp, #-16]!
stp x2, x3, [sp, #-16]!
stp x29, x30, [sp, #-16]!
ldr x0, =IRQ_PENDING_1
ldr w1, [x0]
tst w1, #TIMER_IRQ_1
beq irq_done
ldr x0, =TIMER_CS
mov w1, #(1 << 1)
str w1, [x0]
ldr x0, =gpio_state
ldr w1, [x0]
cbz w1, irq_turn_on
ldr x2, =GPCLR0
mov w3, #(1 << 21)
str w3, [x2]
mov w1, #0
b irq_save_gpio
irq_turn_on:
ldr x2, =GPSET0
mov w3, #(1 << 21)
str w3, [x2]
mov w1, #1
irq_save_gpio:
str w1, [x0]
ldr x0, =TIMER_CLO
ldr w1, [x0]
ldr w2, =TIMER_INTERVAL
add w1, w1, w2
ldr x0, =TIMER_C1
str w1, [x0]
irq_done:
ldp x29, x30, [sp], #16
ldp x2, x3, [sp], #16
ldp x0, x1, [sp], #16
eret
timer_init:
ldr x0, =CORE0_TIMER_IRQCNTL
mov w1, #0
str w1, [x0]
ldr x0, =TIMER_CS
mov w1, #0xF
str w1, [x0]
ldr x0, =TIMER_CLO
ldr w1, [x0]
ldr w2, =TIMER_INTERVAL
add w1, w1, w2
ldr x0, =TIMER_C1
str w1, [x0]
ldr x0, =IRQ_ENABLE_1
mov w1, #TIMER_IRQ_1
str w1, [x0]
ret
gpio_init:
ldr x1, =GPFSEL2
ldr w2, [x1]
bic w2, w2, #(7 << 3)
orr w2, w2, #(1 << 3)
str w2, [x1]
ret
gpio_on:
ldr x1, =GPSET0
mov w2, #(1 << 21)
str w2, [x1]
ret
gpio_off:
ldr x1, =GPCLR0
mov w2, #(1 << 21)
str w2, [x1]
ret
.section ".data"
.align 4
gpio_state:
.word 0
.section ".text"
park_core:
wfe
b park_core
Code Breakdown #
Lets dive into the important parts of the code.
Timer Initialization #
Timer is initialized in the timer_init function. Here we disable local timer routing, clear any existing match flags, set the first compare value based on the current timer value plus a defined interval, and enable the timer interrupt.
timer_init:
ldr x0, =CORE0_TIMER_IRQCNTL
mov w1, #0
str w1, [x0] // Disable local timer routing
ldr x0, =TIMER_CS
mov w1, #0xF
str w1, [x0] // Clear all match flags
ldr x0, =TIMER_CLO
ldr w1, [x0] // Read current timer value
ldr w2, =TIMER_INTERVAL
add w1, w1, w2 // Calculate first compare value
ldr x0, =TIMER_C1
str w1, [x0] // Set compare register C1
ldr x0, =IRQ_ENABLE_1
mov w1, #TIMER_IRQ_1
str w1, [x0] // Enable timer 1 interrupt
ret
EL2 to EL1 Transition #
In the following section we check if we are running on EL2 and if so we set up the necessary registers to switch to EL1.
_start:
mrs x1, mpidr_el1
and x1, x1, #MPIDR_AFFINITY_MASK
cbnz x1, park_core
mrs x0, CurrentEL // Read current exception level
and x0, x0, #0xC // Mask to get EL bits
cmp x0, #0x8 // Compare with EL2
beq from_el2 // If EL2 branch to from_el2
b at_el1 // Else branch to at_el1
If we are at EL2 we set up the registers to switch to EL1. The following code does the nessary setup and performs the switch using the eret instruction. As name suggests the
ERET (Exception Return) instruction is used in the ARM architecture to return from an exception handler and switch to a lower or the same Exception Level (EL), such as from EL2 to EL1.
There is a good lecture series on ARM with lecture slides on from Georgia tech here.
from_el2:
ldr x1, =_start
mov sp, x1
mov x0, #(1 << 31)
msr hcr_el2, x0 // Enable EL1 access to timers
mov x0, #3 // Enable timer access
msr cnthctl_el2, x0 // Set offset to 0
msr cntvoff_el2, xzr
mov x0, #0x3C5
msr spsr_el2, x0 // Set SPSR for EL1
ldr x0, =at_el1 // Load target address
msr elr_el2, x0 // Set ELR to target address
eret
At EL1 we set up the vector table for handling interrupts and initialize GPIO and Timer.
at_el1:
ldr x1, =_start
mov sp, x1
ldr x0, =vector_table // Load vector table address
msr vbar_el1, x0 // Set vector base address
bl gpio_init
bl timer_init
msr daifclr, #2 // Enable IRQs in PSTATE
And the main registers that we need to enable interrupts are HCR_EL2, CNTHCTL_EL2, CNTVOFF_EL2, SPSR_EL2, ELR_EL2, and VBAR_EL1.
| Register | Description |
|---|---|
| HCR_EL2 | Hypervisor Configuration Register - Controls virtualization settings and routing of exceptions |
| CNTHCTL_EL2 | Counter-timer Hypervisor Control Register - Controls access to physical timer from EL1/EL0 |
| CNTVOFF_EL2 | Counter-timer Virtual Offset Register - Virtual counter offset from physical counter |
| SPSR_EL2 | Saved Program Status Register - Holds saved processor state for exception return |
| ELR_EL2 | Exception Link Register - Holds return address for exception return |
| VBAR_EL1 | Vector Base Address Register - Base address of exception vector table at EL1 |
Also when dealing with processor state and exceptions, the following special purpose registers are important:
| Special purpose register | Description | PSTATE fields |
|---|---|---|
| CurrentEL | Holds the current Exception level. | EL |
| DAIF | Specifies the current interrupt mask bits. | D, A, I, F |
| NZCV | Holds the condition flags. | N, Z, C, V |
| SPSel | At EL1 or higher, this selects between the SP for the current Exception level and SP_EL0. | SP |
The important ARM instruction used to access (read/write) these special register is MSR (Move to Special Register).
Now as the timer interrupt is set up we can handle it in the irq_handler function. Here we check if the interrupt is from the timer by reading the IRQ_PENDING_1 register. If it is, we clear the interrupt flag in the TIMER_CS register and toggle the GPIO pin. Finally, we set the next compare value for the timer to trigger the next interrupt. In this example we toggle the GPIO pin state each time the timer interrupt occurs, creating a toggling effect on GPIO21.
The irq_handler function address need to be placed in the vector table at the appropriate offset for IRQs. This is done in the vector_table section of the code.
What is an IRQ Vector Table? #
In ARM AArch64 architecture, when an exception occurs (like an interrupt, system call, or error), the processor jumps to a specific address in memory called the vector table. This table contains 16 entries (one for each type of exception), and each entry is a branch instruction that directs execution to the appropriate handler.
The structuure of the vector table is as follows:
.balign 0x800
vector_table:
// 16 entries
Total size is aligned to 0x800 bytes (2048 bytes) and each entry is 128 bytes long and that’s what we guarantee using .balign 0x80 directive.
The table is organized into 4 groups of 4 entries each:
- Current EL with SP0 (Stack Pointer 0 - rarely used)
- Current EL with SPx (Stack Pointer x - normal stack this is used mostly)
- Lower EL using AArch64 (exceptions from lower privilege levels)
- Lower EL using AArch32 (legacy 32-bit mode)
Each group has 4 exception types:
| Entry | Exception Type | Description |
|---|---|---|
| 0 | Synchronous | System calls, data aborts, etc. |
| 1 | IRQ | Interrupt requests |
| 2 | FIQ | Fast interrupt requests |
| 3 | SError | System errors |
In our case we are interested in the second entry of the second group (Current EL with SPx, IRQ) which is at offset 0x200 from the base of the vector table i.e is 6th entry.
Most of the irq_handler function is self explanatory as we have already explained GPIO toggling and timer reading in previous sections.
IRQ Handler #
In the irq_handler, we first save the state of registers that we will use. and at the end of the handler we restore them before returning from the interrupt using the eret instruction. The reason we save and restore registers is to preserve the state of the program that was interrupted. When an interrupt occurs, the CPU may be in the middle of executing some code, and it uses certain registers to hold data and addresses. If we modify these registers in the interrupt handler without saving their original values, we could corrupt the state of the interrupted program, leading to unexpected behavior or crashes when the program resumes.
stp x0, x1, [sp, #-16]! // Push x0, x1 to stack
stp x2, x3, [sp, #-16]! // Push x2, x3 to stack
stp x29, x30, [sp, #-16]! // Push frame pointer (x29) and link register (x30)
...
...
ldp x29, x30, [sp], #16 // Pop x29, x30 from stack
ldp x2, x3, [sp], #16 // Pop x2, x3 from stack
ldp x0, x1, [sp], #16 // Pop x0, x1 from stack
eret // Exception Return
Then we check if the interrupt is from the timer by reading the IRQ_PENDING_1 register. If it is, we clear the interrupt flag in the TIMER_CS register and toggle the GPIO pin. Finally, we set the next compare value for the timer to trigger the next interrupt.
ldr w1, [x0] // Read pending interrupts
tst w1, #TIMER_IRQ_1 // Check if timer 1 interrupt is pending
beq irq_done // If not timer interrupt, skip handling
ldr x0, =TIMER_CS // Load timer control/status address
mov w1, #(1 << 1) // Set bit 1 to clear M1 match flag check above section for reference
I will skip the GPIO toggling part as it is similar to previous sections and blog posts/.
We clear the interrupt flag in the TIMER_CS register by writing a 1 to the M1 bit. This acknowledges the interrupt and allows the timer to generate future interrupts.
ldr x0, =TIMER_CS // Timer Control/Status register
mov w1, #(1 << 1) // Bit 1 = M1 (Match 1 flag)
str w1, [x0] // Write 1 to clear the flag
The important part is after GPIO handling. register TIMER_CLO is read to get the current timer value, and we add the defined interval to set the next compare value in TIMER_C1.
ldr x0, =TIMER_CLO // Load timer counter lower address
ldr w1, [x0] // Read current timer value
ldr w2, =TIMER_INTERVAL // Load timer interval
add w1, w1, w2 // Calculate next compare value (current + interval)
ldr x0, =TIMER_C1 // Load timer compare 1 address
str w1, [x0] // Set next compare value
Why add to current time?
The System Timer is free-running. When TIMER_CLO equals TIMER_C1, an interrupt fires. By setting C1 = current + interval, we schedule the next interrupt.
Result of Timer Interrupt Implementation #
I connected the GPIO21 on Raspberry Pi Zero 2W. Using the timer interrupt-based approach provides accurate timing without busy-waiting. The GPIO toggles at exactly 1Hz (500ms on, 500ms off) regardless of CPU frequency or instruction timing variations.
Applications of System Timer #
Now you might ask why we need a system timer in the first place?
It is one of the most important peripherals in any embedded system. Its like you parents telling you to wake up at certain time every day when you were a kid and didnt want to go to school. It really precise as it is not affected by code execution time or CPU load. Some of the common applications of system timer are:
-
Precise delays: Creating accurate microsecond/millisecond delays Usage: Two of the four compare registers can be utilized by the CPU (e.g., for OS schedulers and clock interrupts).
-
PWM generation: Software PWM for LED dimming or motor control
-
Task scheduling: Implementing a task scheduler for multitasking systems
etc etc.
Conclusion #
In this blog post I demonstrated how to implement timer functionality on the Raspberry Pi using bare metal programming. We explored the System Timer peripheral, its registers, and how to create accurate delays using both busy-wait loops and timer interrupts. Timers are essential for many embedded applications, and mastering their use is crucial for effective embedded systems development. In my next blog post in this series, I will explore more advanced periphherals and techniques for bare metal programming on the Raspberry Pi. Stay tuned!