Skip to main content

Raspberry PI Bare Metal Vol 1

·2206 words·11 mins· loading ·
Table of Contents

Raspberry PI Bare Metal Booting Up
#

This is my new blog series on Raspberry PI Bare Bone coding. In this series, I will show you how to build a simple bare bone code in assembly for the Raspberry PI. We will start gradually add more features to our code.

Introduction
#

The booting process of a Raspberry PI is quite different from that of a traditional PC. The Raspberry PI uses a unique bootloader that is stored in the GPU’s ROM. This bootloader is responsible for initializing the hardware and loading the operating system. I will not be doing anything with GPU bootloader as it is closed source binary blob and not documented. Instead, I will focus on the ARM CPU and how to get it up and running. But there are part of GPU Mailbox interface that we will be using to get the ARM CPU up and running. If you have read my previous blog posts on Raspberry PI Mailbox interface, you will be familiar with the concepts but I will explain them again here for completeness.

The Raspberry PI boot process can be summarized as follows:

  1. Power On: When the Raspberry PI is powered on, the GPU’s bootloader is executed.
  2. Load Bootcode: The bootloader loads the bootcode.bin file from the SD card into the GPU’s memory.
  3. Initialize Hardware: The bootloader initializes the hardware, including the SDRAM.
  4. Load Start.elf: The bootloader then loads the start.elf file, which is the main firmware for the GPU.
  5. Load Kernel: Finally, the bootloader loads the kernel image (e.g., kernel.img) into memory and transfers control to it.

I will be focusing on the last step, where we load our own kernel image and get it running on the ARM CPU. The code for the kernel will be written in ARM assembly language. In this first volume, we will cover the following topics:

  • Setting up the development environment
  • Writing a simple “Hello, World!” of embedded world - a GPIO toggle kernel
  • Compiling the kernel
  • Testing the kernel on the RaspberryPI

Setting Up the Development Environment
#

To get started, we need to set up our development environment. We will be using a cross-compiler with 64 bit ARM support.

You can install the necessary tools on your Linux machine (Ubuntu 22.04 64 bit) using the following commands:

wget https://developer.arm.com/-/media/Files/downloads/gnu/14.3.rel1/binrel/arm-gnu-toolchain-14.3.rel1-x86_64-aarch64-none-linux-gnu.tar.xz
sudo tar -xvf arm-gnu-toolchain-14.3.rel1-x86_64-aarch64-none-linux-gnu.tar.xz -C /usr/local/
export PATH=$PATH:/usr/local/arm-gnu-toolchain-14.3.rel1-x86_64-aarch64-none-linux-gnu/bin

This will install the ARM cross-compiler, Make, and QEMU for testing on the Raspberry PI Zero 2.

Next, we need to create a directory for our project or clone the raptorOS repository which contains all my bare minimum files to get started.

mkdir rpi_os_bare_bone
cd rpi_os_bare_bone
git clone https://github.com/ph0en1xr3t/raptoros.git    
cd raptoros

Writing a Simple GPIO Toggle Kernel
#

Now that we have our development environment set up, we can start writing our kernel. Create a new file called kernel.s in the src directory and add the following code:

The code below is a simple ARM assembly program that toggles an LED on the RaspberryPI GPIO pin 18.

.equ   MPIDR_AFFINITY_MASK, 0x3
.equ   PERIPHERAL_BASE,   0x3F000000
.equ   GPIO_BASE,         (PERIPHERAL_BASE + 0x200000)
.equ   MAILBOX_BASE,      (PERIPHERAL_BASE + 0xB880)

.equ   GPFSEL1,           (GPIO_BASE + 0x04)
.equ   GPSET0,            (GPIO_BASE + 0x1C)
.equ   GPCLR0,            (GPIO_BASE + 0x28)
.equ   GPPUPPDN0,         (GPIO_BASE + 0xE4)

.equ   MBOX_READ,         (MAILBOX_BASE + 0x00)
.equ   MBOX_STATUS,       (MAILBOX_BASE + 0x18)
.equ   MBOX_WRITE,        (MAILBOX_BASE + 0x20)
.equ   MBOX_FULL,         0x80000000
.equ   MBOX_EMPTY,        0x40000000
.equ   MBOX_CH_PROP,      8

.section ".data"
.align 4

mbox_buffer:
    .int 8*4                
    .int 0            
    .int 0x00028001
    .int 8           
    .int 0                  
    .int 3                  
    .int 3                  
    .int 0                  

.section ".text.boot"
.global _start

_start:
    mrs     x1, mpidr_el1
    and     x1, x1, #MPIDR_AFFINITY_MASK
    cbnz    x1, park_core

master_core_setup:
    ldr     x1, =_start
    mov     sp, x1

    ldr     x1, =mbox_buffer
    bl      mailbox_call
    b       gpio_setup

gpio_setup:
    ldr     x1, =GPFSEL1
    ldr     w2, [x1]
    bic     w2, w2, #(7 << 24)
    orr     w2, w2, #(1 << 24)
    str     w2, [x1]
    dsb     sy
    b       blink_loop

mailbox_call:
    add     x1, x1, #MBOX_CH_PROP
mbox_write_wait:
    ldr     x2, =MBOX_STATUS
    ldr     w3, [x2]
    tst     w3, #MBOX_FULL
    b.ne    mbox_write_wait
    ldr     x2, =MBOX_WRITE
    str     w1, [x2]
mbox_read_wait:
    ldr     x2, =MBOX_STATUS
    ldr     w3, [x2]
    tst     w3, #MBOX_EMPTY
    b.ne    mbox_read_wait
    ldr     x2, =MBOX_READ
    ldr     w3, [x2]
    cmp     w1, w3
    b.ne    mbox_read_wait
    ret

delay_loop:
    mov     x10, #0x1000
delay_inner:
    sub     x10, x10, #1
    cbnz    x10, delay_inner
    ret

blink_loop:
    mov     w1, #(1 << 18)
    
    ldr     x2, =GPSET0
    str     w1, [x2]        
    dsb     sy
    bl      delay_loop

    ldr     x2, =GPCLR0
    str     w1, [x2]   
    dsb     sy
    bl      delay_loop

    b       blink_loop

park_core:
    wfe
    b       park_core

Setting Up the Linker Script
#

To link our assembly code, we need to create a linker script. Create a file called link.ld in the root directory of your project with the following content:

SECTIONS
{
    . = 0x80000;
    .text : { KEEP(*(.text.boot)) *(.text .text.* .gnu.linkonce.t*) }
    .rodata : { *(.rodata .rodata.* .gnu.linkonce.r*) }
    PROVIDE(_data = .);
    .data : { *(.data .data.* .gnu.linkonce.d*) }
    .bss (NOLOAD) : {
        . = ALIGN(16);
        __bss_start = .;
        *(.bss .bss.*)
        *(COMMON)
        __bss_end = .;
    }
    _end = .;

   /DISCARD/ : { *(.comment) *(.gnu*) *(.note*) *(.eh_frame*) }
}
__bss_size = (__bss_end - __bss_start)>>3;

Compiling the Kernel
#

To compile the kernel, we will use the ARM cross-compiler. Create a Makefile in the root directory of your project with the following content:

CFILES = $(wildcard *.c)
OFILES = $(CFILES:.c=.o)
SFILES = boot.s
SOFILES = $(SFILES:.s=.o)
GCCFLAGS = -Wall -O0 -ffreestanding -nostdinc -nostdlib -nostartfiles -mstrict-align

CC = aarch64-none-elf-gcc
LD = aarch64-none-elf-ld
OBJCOPY = aarch64-none-elf-objcopy

all: clean kernel8.img

%.o: %.s
	$(CC) $(GCCFLAGS) -c $< -o $@

%.o: %.c
	$(CC) $(GCCFLAGS) -c $< -o $@

kernel8.img: $(SOFILES) $(OFILES)
	$(LD) -nostdlib $(SOFILES) $(OFILES) -T link.ld -o kernel8.elf
	$(OBJCOPY) -O binary kernel8.elf kernel8.img

clean:
	/bin/rm kernel8.elf *.o *.img > /dev/null 2> /dev/null || true

Now your project structure should look like this:

├── boot.s
├── link.ld
└── Makefile

0 directories, 3 files

Now, you can compile the kernel by running the following command in the terminal:

make

This will generate a kernel8.img file in the root directory of your project.

Testing the Kernel on the RaspberryPI
#

To test the kernel, you can use the same SDCard that you used to boot the RaspberryPI zero 2W. Copy the kernel8.img file to the root of the SD card. Replace the existing kernel8.img file on the SD card with the new one. Make sure that the SD card also contains the necessary boot files bootcode.bin, start.elf, etc. and that no other files are deleted.

Now, insert the SD card into the RaspberryPI and power it on. If everything is set up correctly, you should see the LED connected or Logic analyzer to GPIO pin 18 blinking on and off.

Explanation of the Code
#

Let’s go through the code step by step:

  1. Constants and Memory Addresses: We define several constants for the peripheral base address, GPIO base address, and mailbox base address. These addresses are specific to the RaspberryPI hardware registers can be found in the BCM2835 ARM Peripherals documentation. The address varies depending on the RaspberryPI model for example the RaspberryPI Zero 2W uses 0x3F000000 as the peripheral base address. All other addresses are calculated based on this base address.
.equ   MPIDR_AFFINITY_MASK, 0x3                        // Mask to get the core ID
.equ   PERIPHERAL_BASE,   0x3F000000                   // RaspberryPI Zero 2W
.equ   GPIO_BASE,         (PERIPHERAL_BASE + 0x200000) // GPIO base address
.equ   MAILBOX_BASE,      (PERIPHERAL_BASE + 0xB880)   // Mailbox base address

.equ   GPFSEL1,           (GPIO_BASE + 0x04)           // GPIO Function Select Register 1
.equ   GPSET0,            (GPIO_BASE + 0x1C)           // GPIO Pin Output Set Register 0
.equ   GPCLR0,            (GPIO_BASE + 0x28)           // GPIO Pin Output Clear Register 0    
.equ   GPPUPPDN0,         (GPIO_BASE + 0xE4)           // GPIO Pull-up/Pull-down Register 0

.equ   MBOX_READ,         (MAILBOX_BASE + 0x00)        // Mailbox Read Register
.equ   MBOX_STATUS,       (MAILBOX_BASE + 0x18)        // Mailbox Status Register
.equ   MBOX_WRITE,        (MAILBOX_BASE + 0x20)        // Mailbox Write Register
.equ   MBOX_FULL,         0x80000000                   // Mailbox Full Flag
.equ   MBOX_EMPTY,        0x40000000                   // Mailbox Empty Flag
.equ   MBOX_CH_PROP,      8                            // Mailbox Channel Property

The mask is used to extract the core ID from the MPIDR register as shown below.

  1. Mailbox Buffer: Mailbox is needed as GPIOs are not enabled by default. We create a mailbox buffer in the .data section to communicate with the GPU. The mailbox buffer is used to request the GPU to enable the GPIOs.

Below is the sequence diagram describing mailbox message to enable GPIOs.

Mailbox Buffer Sequence Diagram

The value 0x00028001 is the request code to enable GPIOs. The payload consists of the GPIO pin number (18), the function (3 for output), and the value (3 to enable). The buffer is terminated with a zero. All messages are described in following github link Link

.section ".data"
.align 4

mbox_buffer:
    .int 8*4        // Total size of the buffer in bytes         
    .int 0          // Request code
    .int 0x00028001 // Request to enable GPIOs
    .int 8          // Size of the payload
    .int 0          // Payload (GPIO 18)
    .int 3          // Payload (function 3: set GPIO 18 to output)
    .int 3          // Payload (value 3: enable GPIO 18)
    .int 0          // Padding for alignment

The mailbox buffer needs to be 16 byte aligned.

  1. Entry Point: The _start label is the entry point of the program. It checks if the current core is the master core (core 0) and sets up the stack pointer.
section ".text.boot"                       // Boot code .text section
.global _start                             // Entry point can be anything for bare bone

_start:
    mrs     x1, mpidr_el1                  // Read the MPIDR register to get the core ID
    and     x1, x1, #MPIDR_AFFINITY_MASK   // Mask to get the core ID
    cbnz    x1, park_core                  // If not core 0, park the core

master_core_setup:
    ldr     x1, =_start                    // Load the address of _start
    mov     sp, x1                         // Set the stack pointer to the start address

    ldr     x1, =mbox_buffer               // Load the address of the mailbox buffer
    bl      mailbox_call                   // Call the mailbox function to enable GPIOs
    b       gpio_setup                     // Branch to GPIO setup
  1. Mailbox Call: The mailbox_call function sends the mailbox buffer to the GPU and waits for a response. This is necessary to enable the GPIOs.
mailbox_call:
    add     x1, x1, #MBOX_CH_PROP // Add the channel property to the address
mbox_write_wait:
    ldr     x2, =MBOX_STATUS      // Load the address of the mailbox status register
    ldr     w3, [x2]              // Read the status register
    tst     w3, #MBOX_FULL        // Test if the mailbox is full
    b.ne    mbox_write_wait       // If full, wait
    ldr     x2, =MBOX_WRITE       // Load the address of the mailbox write register
    str     w1, [x2]              // Write the address of the mailbox buffer
mbox_read_wait:    
    ldr     x2, =MBOX_STATUS      // Load the address of the mailbox status register
    ldr     w3, [x2]              // Read the status register
    tst     w3, #MBOX_EMPTY       // Test if the mailbox is empty
    b.ne    mbox_read_wait        // If empty, wait
    ldr     x2, =MBOX_READ        // Load the address of the mailbox read register
    ldr     w3, [x2]              // Read the response
    cmp     w1, w3                // Compare the response with the request
    b.ne    mbox_read_wait        // If not equal, wait
    ret
  1. GPIO Setup: The gpio_setup section configures GPIO pin 18 as an output by modifying the appropriate bits in the GPFSEL1 register. The configuration process is not that different from what we do in C language. We read the current value of the register, modify the bits corresponding to GPIO 18, and write the new value back to the register. I have covered it in detail in my previous blog post Link.
gpio_setup:
    ldr     x1, =GPFSEL1        // Load the address of the GPFSEL1 register
    ldr     w2, [x1]            // Read the current value of the register
    orr     w2, w2, #(1 << 24)  // Set GPIO 18 to output
    str     w2, [x1]            // Write the new value back to the register
    ret                      
    dsb     sy                  // Data Synchronization Barrier
    b       blink_loop          // Repeat the loop
  1. Blink Loop: The blink_loop section toggles GPIO pin 18 on and off with a delay in between. It uses the GPSET0 and GPCLR0 registers to set and clear the pin.
blink_loop:
    mov     w1, #(1 << 18) // Bit mask for GPIO 18
    
    ldr     x2, =GPSET0   // Load the address of the GPSET0 register
    str     w1, [x2]      // Set GPIO 18 high 
    dsb     sy            // Data Synchronization Barrier
    bl      delay_loop    // Call the delay loop

    ldr     x2, =GPCLR0   // Load the address of the GPCLR0 register
    str     w1, [x2]      // Set GPIO 18 low
    dsb     sy            // Data Synchronization Barrier
    bl      delay_loop    // Call the delay loop

    b       blink_loop    // Repeat the loop
  1. Delay Loop: The delay_loop function creates a simple delay by looping a fixed number of times. This is used to control the blink rate of the LED.
delay_loop:
    mov     x10, #0x1000     // Load a value for the delay
delay_inner:
    sub     x10, x10, #1     // Decrement the counter
    cbnz    x10, delay_inner // If not zero, continue looping
    ret

Final Tests and Conclusion
#

After successfully running the kernel on the RaspberryPI and observing the LED blinking. Here is my setup with a logic analyzer to verify the GPIO toggling. One important point to note is that we always need to load 32 bit value into 32 bit register even if we are using 64 bit ARM CPU. So we use w registers for 32 bit operations.

str w1, [x2]

This has taken me lot of time to figure out as I was trying to use x registers for 32 bit operations which was not working.

RaspberryPI Zero 2W with Logic Analyzer

The logic analyzer shows the GPIO pin 18 toggling between high and low states, confirming that our bare bone kernel is functioning correctly.