Skip to main content

Raspberry PI Bare Metal Vol 4 - Timer

·3371 words·16 mins· loading ·
Table of Contents

Raspberry PI Bare Metal Vol 4 : Timer
#

In my previous blog post I created bare metal code for SPI communication. In this post, I will continue the journey by implementing Timer functionality on the Raspberry Pi using Timer peripheral.

System Timer
#

The Raspberry Pi BCM2711 SoC includes a System Timer that runs at 1MHz, providing microsecond resolution. This timer is independent of the ARM core and continues running regardless of the processor state. Described on Chapter 10 of the BCM2711 ARM Peripherals Manual.

The System Timer provides a 64-bit free-running counter and 4 compare registers for generating interrupts with microsecond resolution timing

Timer Registers
#

The System Timer base address is 0x3F003000 (for Raspberry Pi Zero 2W) and all offsets are from this base address.

Offset Name Description
0x00 CS System Timer Control/Status
0x04 CLO System Timer Counter Lower 32 bits
0x08 CHI System Timer Counter Upper 32 bits
0x0C C0 System Timer Compare 0
0x10 C1 System Timer Compare 1
0x14 C2 System Timer Compare 2
0x18 C3 System Timer Compare 3

CS Register Bits
#

The System Timer Control/Status register tracks and clears comparator match events for each timer channel. Match signals from this register are forwarded to the interrupt controller to generate interrupt requests.

The M0-3 fields indicate whether a free-running counter match has occurred. To acknowledge and clear a match detection flag along with its associated interrupt request line, write a one to the corresponding bit.

Bits Name Description Type Reset
31:4 Reserved - - -
3 M3 System Timer Match 3
• 0 = No match
• 1 = Match detected
W1C 0x0
2 M2 System Timer Match 2
• 0 = No match
• 1 = Match detected
W1C 0x0
1 M1 System Timer Match 1
• 0 = No match
• 1 = Match detected
W1C 0x0
0 M0 System Timer Match 0
• 0 = No match
• 1 = Match detected
W1C 0x0

CLO Register
#

System Timer Counter Lower bits: The system timer free-running counter lower register is a read-only register that returns the current value of the lower 32-bits of the free running counter.

Bits Field Name Description Type Reset
31:0 CNT Lower 32-bits of the free running counter value RW 0x0

CHI Register
#

System Timer Counter Higher bits. Similar to CLOregister but for higher 32 bits.

Bits Field Name Description Type Reset
31:0 CNT Higher 32-bits of the free running counter value RW 0x0

C0, C1, C2, C3 Registers
#

System Timer Compare: The system timer compare registers hold the compare value for each of the four timer channels. Whenever the lower 32-bits of the free-running counter matches one of the compare values, the corresponding bit in the system timer control/status register is set.

Bits Field Name Description Type Reset
31:0 CMP Compare value for match channel n RW 0x0

Delay based Timer implementation
#

Below is the complete assembly code to set up the System Timer and create a delay function:

.equ   MPIDR_AFFINITY_MASK, 0x3
.equ   PERIPHERAL_BASE,     0x3F000000
.equ   TIMER_CLO,           (PERIPHERAL_BASE + 0x003004)
.equ   GPFSEL2,             (PERIPHERAL_BASE + 0x200008)
.equ   GPSET0,              (PERIPHERAL_BASE + 0x20001C)
.equ   GPCLR0,              (PERIPHERAL_BASE + 0x200028)

.section ".text.boot"
.global _start

_start:
    mrs     x1, mpidr_el1
    and     x1, x1, #MPIDR_AFFINITY_MASK
    cbnz    x1, park_core
    ldr     x1, =_start
    mov     sp, x1
    bl      gpio_init

main_loop:
    bl      gpio_on
    ldr     x0, =500000
    bl      delay_us
    bl      gpio_off
    ldr     x0, =500000
    bl      delay_us
    b       main_loop

gpio_init:
    ldr     x1, =GPFSEL2
    ldr     w2, [x1]
    bic     w2, w2, #(7 << 3)
    orr     w2, w2, #(1 << 3)
    str     w2, [x1]
    ret

gpio_on:
    ldr     x1, =GPSET0
    mov     w2, #(1 << 21)
    str     w2, [x1]
    ret

gpio_off:
    ldr     x1, =GPCLR0
    mov     w2, #(1 << 21)
    str     w2, [x1]
    ret

delay_us:
    stp     x29, x30, [sp, #-16]!
    stp     x19, x20, [sp, #-16]!
    mov     x19, x0
    ldr     x1, =TIMER_CLO
    ldr     w20, [x1]
    add     w20, w20, w19
delay_loop:
    ldr     w0, [x1]
    sub     w2, w20, w0
    cmp     w2, #0
    bgt     delay_loop
    ldp     x19, x20, [sp], #16
    ldp     x29, x30, [sp], #16
    ret

park_core:
    wfe
    b       park_core

Code Breakdown
#

Timer Reading
#

I will not explain every function again as some of them are similar to previous posts like GPIO initialization and Toggling. I will focus on the timer-related functions.

Delay Function
#

The delay_us function implements a microsecond delay by reading the current timer value from the CLO register and calculating a target time by adding the desired delay. It then enters a loop, continuously reading the timer until the current time reaches the target.

delay_us:
    stp     x29, x30, [sp, #-16]!
    stp     x19, x20, [sp, #-16]!
    mov     x19, x0              // Save delay value
    ldr     x1, =TIMER_CLO       // Read current timer value (lower 32 bits)
    ldr     w20, [x1]            // Get current timer count
    add     w20, w20, w19        // target = current + delay

delay_loop:
    ldr     w0, [x1]             // Read current timer value
    sub     w2, w20, w0          // remaining = target - current
    cmp     w2, #0               // Check if remaining > 0
    bgt     delay_loop           // Loop while remaining > 0
    
    ldp     x19, x20, [sp], #16
    ldp     x29, x30, [sp], #16
    ret

Results Of Timer Implementation with Delay
#

I connected the GPIO21 on Raspberry Pi Zero 2W. Using the timer-based delay instead of simple loop-based delays provides much more accurate timing. The GPIO toggles at exactly 1Hz (500ms on, 500ms off).

Gpio Toggling

Using Timer Interrupts (Advanced)
#

In the above implementation we have used a delay loop to check of the timer value until the desired time has elapsed. This is known as abusy-wait loop and while it is simple to implement, it is not the most efficient way to use the CPU as it wastes cycles checking the timer continuously hense the name “busy-wait”.

To really take full advantage of the timer, we should use timer interrupts. This way the CPU can perform other tasks or enter a low-power state while waiting for the timer to expire. When the timer reaches the compare value, it triggers an interrupt, allowing the CPU to handle the event without wasting cycles in a busy-wait loop.

Hypervisor Exception Level (EL2) and OS Exception Level (EL1)
#

Now th implementation is bit complicated than previous one because we need to first set up. The reason is by default the GPU of Raspberry Pi puts the Arm core to Exception level 2 (EL2) Forum.

EL2 is typically used for hypervisors and virtualization, while EL1 is used for operating systems and bare-metal applications. If you want to know more about Exception Levels there is a good article.

As you can see from the discussion we need to route the interrupts from EL1 to EL2 by setting HCR_EL2 register.

By default, ARM64 routes all interrupts to EL1, even if the processor is currently running at EL2. This is by design, not a bug. Why ARM Does This? ARM designed the architecture with virtualization in mind: EL2 is the hypervisor level - It’s meant to manage virtual machines, not handle regular interrupts EL1 is the OS kernel level - This is where interrupts “belong” in a typical system, Default routing to EL1 allows a hypervisor to run guest OSes without intercepting every interrupt. More about it in this article AND this article.

I will take harder part and keep the easier one as home work for the readers.

Complete Timer Interrupt Implementation
#

Here is an code for setting up timer interrupts on Raspberry Pi by jumping into EL1 from EL2:

.equ   MPIDR_AFFINITY_MASK, 0x3
.equ   PERIPHERAL_BASE,     0x3F000000
.equ   TIMER_BASE,          (PERIPHERAL_BASE + 0x003000)
.equ   TIMER_CS,            (TIMER_BASE + 0x00)
.equ   TIMER_CLO,           (TIMER_BASE + 0x04)
.equ   TIMER_C1,            (TIMER_BASE + 0x10)
.equ   IRQ_BASE,            (PERIPHERAL_BASE + 0xB000)
.equ   IRQ_PENDING_1,       (IRQ_BASE + 0x204)
.equ   IRQ_ENABLE_1,        (IRQ_BASE + 0x210)
.equ   LOCAL_BASE,          0x40000000
.equ   CORE0_TIMER_IRQCNTL, (LOCAL_BASE + 0x40)
.equ   TIMER_IRQ_1,         (1 << 1)
.equ   GPFSEL2,             (PERIPHERAL_BASE + 0x200008)
.equ   GPSET0,              (PERIPHERAL_BASE + 0x20001C)
.equ   GPCLR0,              (PERIPHERAL_BASE + 0x200028)
.equ   TIMER_INTERVAL,      500000

.section ".text.boot"
.global _start

_start:
    mrs     x1, mpidr_el1
    and     x1, x1, #MPIDR_AFFINITY_MASK
    cbnz    x1, park_core
    mrs     x0, CurrentEL
    and     x0, x0, #0xC
    cmp     x0, #0x8
    beq     from_el2
    b       at_el1

from_el2:
    ldr     x1, =_start
    mov     sp, x1
    mov     x0, #(1 << 31)
    msr     hcr_el2, x0
    //mov     x0, #3             // Enable timer access when using ARM generic timer
    //msr     cnthctl_el2, x0
    //msr     cntvoff_el2, xzr
    mov     x0, #0x3C5
    msr     spsr_el2, x0
    ldr     x0, =at_el1
    msr     elr_el2, x0
    eret

at_el1:
    ldr     x1, =_start
    mov     sp, x1
    ldr     x0, =vector_table
    msr     vbar_el1, x0
    bl      gpio_init
    bl      timer_init
    msr     daifclr, #2

main_loop:
    wfi
    b       main_loop

.balign 0x800
vector_table:
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       irq_handler
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang
    .balign 0x80
    b       hang

hang:
    wfe
    b       hang

irq_handler:
    stp     x0, x1, [sp, #-16]!
    stp     x2, x3, [sp, #-16]!
    stp     x29, x30, [sp, #-16]!
    ldr     x0, =IRQ_PENDING_1
    ldr     w1, [x0]
    tst     w1, #TIMER_IRQ_1
    beq     irq_done
    ldr     x0, =TIMER_CS
    mov     w1, #(1 << 1)
    str     w1, [x0]
    ldr     x0, =gpio_state
    ldr     w1, [x0]
    cbz     w1, irq_turn_on
    ldr     x2, =GPCLR0
    mov     w3, #(1 << 21)
    str     w3, [x2]
    mov     w1, #0
    b       irq_save_gpio
irq_turn_on:
    ldr     x2, =GPSET0
    mov     w3, #(1 << 21)
    str     w3, [x2]
    mov     w1, #1
irq_save_gpio:
    str     w1, [x0]
    ldr     x0, =TIMER_CLO
    ldr     w1, [x0]
    ldr     w2, =TIMER_INTERVAL
    add     w1, w1, w2
    ldr     x0, =TIMER_C1
    str     w1, [x0]
irq_done:
    ldp     x29, x30, [sp], #16
    ldp     x2, x3, [sp], #16
    ldp     x0, x1, [sp], #16
    eret

timer_init:
    ldr     x0, =CORE0_TIMER_IRQCNTL
    mov     w1, #0
    str     w1, [x0]
    ldr     x0, =TIMER_CS
    mov     w1, #0xF
    str     w1, [x0]
    ldr     x0, =TIMER_CLO
    ldr     w1, [x0]
    ldr     w2, =TIMER_INTERVAL
    add     w1, w1, w2
    ldr     x0, =TIMER_C1
    str     w1, [x0]
    ldr     x0, =IRQ_ENABLE_1
    mov     w1, #TIMER_IRQ_1
    str     w1, [x0]
    ret

gpio_init:
    ldr     x1, =GPFSEL2
    ldr     w2, [x1]
    bic     w2, w2, #(7 << 3)
    orr     w2, w2, #(1 << 3)
    str     w2, [x1]
    ret

gpio_on:
    ldr     x1, =GPSET0
    mov     w2, #(1 << 21)
    str     w2, [x1]
    ret

gpio_off:
    ldr     x1, =GPCLR0
    mov     w2, #(1 << 21)
    str     w2, [x1]
    ret

.section ".data"
.align 4
gpio_state:
    .word 0

.section ".text"
park_core:
    wfe
    b       park_core

Code Breakdown
#

Lets dive into the important parts of the code.

Timer Initialization
#

Timer is initialized in the timer_init function. Here we disable local timer routing, clear any existing match flags, set the first compare value based on the current timer value plus a defined interval, and enable the timer interrupt.

timer_init:
    ldr     x0, =CORE0_TIMER_IRQCNTL
    mov     w1, #0
    str     w1, [x0]              // Disable local timer routing
    ldr     x0, =TIMER_CS
    mov     w1, #0xF
    str     w1, [x0]              // Clear all match flags
    ldr     x0, =TIMER_CLO
    ldr     w1, [x0]              // Read current timer value
    ldr     w2, =TIMER_INTERVAL
    add     w1, w1, w2            // Calculate first compare value
    ldr     x0, =TIMER_C1
    str     w1, [x0]              // Set compare register C1
    ldr     x0, =IRQ_ENABLE_1
    mov     w1, #TIMER_IRQ_1
    str     w1, [x0]              // Enable timer 1 interrupt
    ret

EL2 to EL1 Transition
#

In the following section we check if we are running on EL2 and if so we set up the necessary registers to switch to EL1.

_start:
    mrs     x1, mpidr_el1
    and     x1, x1, #MPIDR_AFFINITY_MASK
    cbnz    x1, park_core
    mrs     x0, CurrentEL // Read current exception level
    and     x0, x0, #0xC  // Mask to get EL bits
    cmp     x0, #0x8      // Compare with EL2
    beq     from_el2     // If EL2 branch to from_el2
    b       at_el1        // Else branch to at_el1

If we are at EL2 we set up the registers to switch to EL1. The following code does the nessary setup and performs the switch using the eret instruction. As name suggests the ERET (Exception Return) instruction is used in the ARM architecture to return from an exception handler and switch to a lower or the same Exception Level (EL), such as from EL2 to EL1.

There is a good lecture series on ARM with lecture slides on from Georgia tech here.

from_el2:
    ldr     x1, =_start
    mov     sp, x1
    mov     x0, #(1 << 31)
    msr     hcr_el2, x0      // Enable EL1 access to timers
    mov     x0, #3           // Enable timer access 
    msr     cnthctl_el2, x0  // Set offset to 0 
    msr     cntvoff_el2, xzr
    mov     x0, #0x3C5        
    msr     spsr_el2, x0     // Set SPSR for EL1
    ldr     x0, =at_el1     // Load target address
    msr     elr_el2, x0     // Set ELR to target address
    eret

At EL1 we set up the vector table for handling interrupts and initialize GPIO and Timer.

at_el1:
    ldr     x1, =_start
    mov     sp, x1
    ldr     x0, =vector_table // Load vector table address
    msr     vbar_el1, x0  // Set vector base address
    bl      gpio_init 
    bl      timer_init
    msr     daifclr, #2   // Enable IRQs in PSTATE

And the main registers that we need to enable interrupts are HCR_EL2, CNTHCTL_EL2, CNTVOFF_EL2, SPSR_EL2, ELR_EL2, and VBAR_EL1.

Register Description
HCR_EL2 Hypervisor Configuration Register - Controls virtualization settings and routing of exceptions
CNTHCTL_EL2 Counter-timer Hypervisor Control Register - Controls access to physical timer from EL1/EL0
CNTVOFF_EL2 Counter-timer Virtual Offset Register - Virtual counter offset from physical counter
SPSR_EL2 Saved Program Status Register - Holds saved processor state for exception return
ELR_EL2 Exception Link Register - Holds return address for exception return
VBAR_EL1 Vector Base Address Register - Base address of exception vector table at EL1

Also when dealing with processor state and exceptions, the following special purpose registers are important:

Special purpose register Description PSTATE fields
CurrentEL Holds the current Exception level. EL
DAIF Specifies the current interrupt mask bits. D, A, I, F
NZCV Holds the condition flags. N, Z, C, V
SPSel At EL1 or higher, this selects between the SP for the current Exception level and SP_EL0. SP

The important ARM instruction used to access (read/write) these special register is MSR (Move to Special Register).

Now as the timer interrupt is set up we can handle it in the irq_handler function. Here we check if the interrupt is from the timer by reading the IRQ_PENDING_1 register. If it is, we clear the interrupt flag in the TIMER_CS register and toggle the GPIO pin. Finally, we set the next compare value for the timer to trigger the next interrupt. In this example we toggle the GPIO pin state each time the timer interrupt occurs, creating a toggling effect on GPIO21.

The irq_handler function address need to be placed in the vector table at the appropriate offset for IRQs. This is done in the vector_table section of the code.

What is an IRQ Vector Table?
#

In ARM AArch64 architecture, when an exception occurs (like an interrupt, system call, or error), the processor jumps to a specific address in memory called the vector table. This table contains 16 entries (one for each type of exception), and each entry is a branch instruction that directs execution to the appropriate handler.

The structuure of the vector table is as follows:

.balign 0x800
vector_table:
   // 16 entries

Total size is aligned to 0x800 bytes (2048 bytes) and each entry is 128 bytes long and that’s what we guarantee using .balign 0x80 directive.

The table is organized into 4 groups of 4 entries each:

  1. Current EL with SP0 (Stack Pointer 0 - rarely used)
  2. Current EL with SPx (Stack Pointer x - normal stack this is used mostly)
  3. Lower EL using AArch64 (exceptions from lower privilege levels)
  4. Lower EL using AArch32 (legacy 32-bit mode)

Each group has 4 exception types:

Entry Exception Type Description
0 Synchronous System calls, data aborts, etc.
1 IRQ Interrupt requests
2 FIQ Fast interrupt requests
3 SError System errors

In our case we are interested in the second entry of the second group (Current EL with SPx, IRQ) which is at offset 0x200 from the base of the vector table i.e is 6th entry.

Most of the irq_handler function is self explanatory as we have already explained GPIO toggling and timer reading in previous sections.

IRQ Handler
#

In the irq_handler, we first save the state of registers that we will use. and at the end of the handler we restore them before returning from the interrupt using the eret instruction. The reason we save and restore registers is to preserve the state of the program that was interrupted. When an interrupt occurs, the CPU may be in the middle of executing some code, and it uses certain registers to hold data and addresses. If we modify these registers in the interrupt handler without saving their original values, we could corrupt the state of the interrupted program, leading to unexpected behavior or crashes when the program resumes.

    stp     x0, x1, [sp, #-16]!     // Push x0, x1 to stack
    stp     x2, x3, [sp, #-16]!     // Push x2, x3 to stack
    stp     x29, x30, [sp, #-16]!   // Push frame pointer (x29) and link register (x30)

    ...
    ...

    ldp     x29, x30, [sp], #16     // Pop x29, x30 from stack
    ldp     x2, x3, [sp], #16       // Pop x2, x3 from stack
    ldp     x0, x1, [sp], #16       // Pop x0, x1 from stack
    eret                            // Exception Return

Then we check if the interrupt is from the timer by reading the IRQ_PENDING_1 register. If it is, we clear the interrupt flag in the TIMER_CS register and toggle the GPIO pin. Finally, we set the next compare value for the timer to trigger the next interrupt.

    ldr     w1, [x0]             // Read pending interrupts
    tst     w1, #TIMER_IRQ_1    // Check if timer 1 interrupt is pending
    beq     irq_done            // If not timer interrupt, skip handling
    ldr     x0, =TIMER_CS       // Load timer control/status address
    mov     w1, #(1 << 1)      // Set bit 1 to clear M1 match flag check above section for reference

I will skip the GPIO toggling part as it is similar to previous sections and blog posts/.

We clear the interrupt flag in the TIMER_CS register by writing a 1 to the M1 bit. This acknowledges the interrupt and allows the timer to generate future interrupts.

    ldr     x0, =TIMER_CS           // Timer Control/Status register
    mov     w1, #(1 << 1)           // Bit 1 = M1 (Match 1 flag)
    str     w1, [x0]                // Write 1 to clear the flag

The important part is after GPIO handling. register TIMER_CLO is read to get the current timer value, and we add the defined interval to set the next compare value in TIMER_C1.

    ldr     x0, =TIMER_CLO       // Load timer counter lower address
    ldr     w1, [x0]            // Read current timer value
    ldr     w2, =TIMER_INTERVAL  // Load timer interval
    add     w1, w1, w2          // Calculate next compare value (current + interval)
    ldr     x0, =TIMER_C1       // Load timer compare 1 address
    str     w1, [x0]            // Set next compare value

Why add to current time? The System Timer is free-running. When TIMER_CLO equals TIMER_C1, an interrupt fires. By setting C1 = current + interval, we schedule the next interrupt.

Result of Timer Interrupt Implementation
#

I connected the GPIO21 on Raspberry Pi Zero 2W. Using the timer interrupt-based approach provides accurate timing without busy-waiting. The GPIO toggles at exactly 1Hz (500ms on, 500ms off) regardless of CPU frequency or instruction timing variations.

Gpio Toggling with Interrupts

Applications of System Timer
#

Now you might ask why we need a system timer in the first place?

It is one of the most important peripherals in any embedded system. Its like you parents telling you to wake up at certain time every day when you were a kid and didnt want to go to school. It really precise as it is not affected by code execution time or CPU load. Some of the common applications of system timer are:

  • Precise delays: Creating accurate microsecond/millisecond delays Usage: Two of the four compare registers can be utilized by the CPU (e.g., for OS schedulers and clock interrupts).

  • PWM generation: Software PWM for LED dimming or motor control

  • Task scheduling: Implementing a task scheduler for multitasking systems

etc etc.

Conclusion
#

In this blog post I demonstrated how to implement timer functionality on the Raspberry Pi using bare metal programming. We explored the System Timer peripheral, its registers, and how to create accurate delays using both busy-wait loops and timer interrupts. Timers are essential for many embedded applications, and mastering their use is crucial for effective embedded systems development. In my next blog post in this series, I will explore more advanced periphherals and techniques for bare metal programming on the Raspberry Pi. Stay tuned!