tannewt
Posts: 63
Joined: Tue Nov 17, 2020 1:14 am

MMU settings for 1:1 mapping

Wed Sep 22, 2021 11:52 pm

Hi folks, I'm trying to get TinyUSB (and then CircuitPython hopefully) running on the RPi4. To get the debugging output I'm using newlib to provide strlen. Without the MMU the strlen is crashing on misalignment. I saw somewhere that having the MMU configured for normal memory would allow for the unalignment that the newlib build expects.

I'm trying to set the MMU to have the virtual addresses be the same as the physical addresses. This current version isn't giving an access fault. It is faulting on an invalid instruction. When I disassemble I get all zeros instead of the contents of ram. I must have something setup wrong. Any pointers of what to tweak?

From https://github.com/tannewt/tinyusb/blob ... 2711/mmu.c :

Code: Select all

// Each entry is a gig.
volatile uint64_t level_1_table[32] __attribute__((aligned(4096)));

// Third gig has peripherals
uint64_t level_2_0x0_c000_0000_to_0x1_0000_0000[512] __attribute__((aligned(4096)));

void setup_mmu_flat_map(void) {
    // Set the first gig to regular access.
    level_1_table[0] = 0x0000000000000000 | 
                       MM_DESCRIPTOR_MAIR_INDEX(MT_NORMAL_NC) |
                       MM_DESCRIPTOR_BLOCK |
                       MM_DESCRIPTOR_VALID;
    level_1_table[2] = ((uint64_t) level_2_0x0_c000_0000_to_0x1_0000_0000) |
                       MM_DESCRIPTOR_TABLE |
                       MM_DESCRIPTOR_VALID;
    // Set peripherals to register access.
    for (uint64_t i = 480; i < 512; i++) {
        level_2_0x0_c000_0000_to_0x1_0000_0000[i] = (0x00000000c0000000 + (i << 21)) |
                                                    MM_DESCRIPTOR_EXECUTE_NEVER |
                                                    MM_DESCRIPTOR_MAIR_INDEX(MT_DEVICE_nGnRnE) |
                                                    MM_DESCRIPTOR_BLOCK |
                                                    MM_DESCRIPTOR_VALID;
    }
    uint64_t mair = MAIR_VALUE;
    uint64_t tcr = TCR_VALUE;
    uint64_t ttbr0 = ((uint64_t) level_1_table) | MM_TTBR_CNP;
    uint64_t sctlr = 0;
    __asm__ volatile (
        // Set MAIR
        "MSR MAIR_EL2, %[mair]\n\t"
        // Set TTBR0
        "MSR TTBR0_EL2, %[ttbr0]\n\t"
        // Set TCR
        "MSR TCR_EL2, %[tcr]\n\t"
        // The ISB forces these changes to be seen before the MMU is enabled.
        "ISB\n\t"
        // Read System Control Register configuration data
        "MRS %[sctlr], SCTLR_EL2\n\t"
        // Write System Control Register configuration data
        "ORR %[sctlr], %[sctlr], #1\n\t"
        // Set [M] bit and enable the MMU.
        "MSR SCTLR_EL2, %[sctlr]\n\t"
        // The ISB forces these changes to be seen by the next instruction
        "ISB"
        : /* No outputs. */
        : [mair] "r" (mair),
          [tcr] "r" (tcr),
          [ttbr0] "r" (ttbr0),
          [sctlr] "r" (sctlr)
    );
    while (true) {}
}

The header file is here: https://github.com/tannewt/tinyusb/blob ... 2711/mmu.h

Thanks!

Schnoogle
Posts: 154
Joined: Sun Feb 11, 2018 4:47 pm

Re: MMU settings for 1:1 mapping

Sun Sep 26, 2021 10:08 am

Hi there,

as I'm having only some MMU experience on the Raspberry Pi3 but not 4 I'm not 100% sure whether the config is exactly the same assuming 64Bit mode on both platforms.

The MMU levels all cover a specific granularity of memory to be configured. Assuming a granule size configuration of 4KB (based on TCR_EL2 register): the very first level configures a whole 1GB of memory region. The second level will cover 2MB regions and level 3 typically 4KB regions. It's quite common to only maintain the first two levels as you did. If a specific level will be "subdivided" by a more finer one the specific entry requires to be a table entry and thus pointing to the start address of the next level config table. As per the documentation the very first level of the MMU is not allowed to be a block entry.

Thus the first entry in your config table:

Code: Select all

level_1_table[0] = 0x0000000000000000 | 
                       MM_DESCRIPTOR_MAIR_INDEX(MT_NORMAL_NC) |
                       MM_DESCRIPTOR_BLOCK |
                       MM_DESCRIPTOR_VALID;
violates this rule as it is configured as block entry and not table entry and thus not pointing to a next level config table.

The second thing I'm spotting is that you skip maintaining level_1_table[1], thus you skip the configuration for the memory region between 1 and 2 GB.
Some numbers:

Code: Select all

level1[0] is covering memory between 0x0 and 0x3FFF_FFFF
level1[1] is covering memory between 0x4000_0000 and 0x7FFF_FFFF
level1[2] is covering memory between 0x8000_0000 and 0xBFFF_FFFF
level1[3] is covering memory between 0xC000_0000 and 0xFFFF_FFFF
All your code and instructions is typically loaded into the first 1GB memory, so failing to maintain that region correctly will lead to the fact the CPU is unable to read the instructions from memory. This memory should be maintained as "normal memory".
Checking the numbers the memory region you'd like to maintain as device memory would require to maintain the block config table pointer in level1[3] and than the corresponding block settings as you intended.

The other thing I'm wondering are your TCR_EL2 decisions. Why do you choose T0SZ being 29 (using 36Bit address scheme) ? I'm using here a value of 25 making the phys. address scheme 40Bit. - The different value will have an impact on the TTLB entry structure and how many bits are used for the address and how the address need to be shifted. This register might also miss the IRGN0 and ORGN0 settings?

Why are you setting the TTBR_CNP flag in the TTBR0 register?

To disable the alignment fault check with active MMU you also need to to clear the respective flag in the SCTLR_EL2 register, I guess it's set to 1 by default.

Another hint: once the MMU has been activated you should let 2 CPU cycles pass and then call tlbi alle2 to ensure the MMU related cache will be invalidated and the new settings are picked up.

As I already said - this is RPi3 experience - you might need to check how this applies to RPi4 and your config, but it might give you the right direction where to look at.

BR
Schnoogle

PS: I'll add my own understanding of the MMU config as I'm using it in my projects (gathered from the different available sources in internet):

Code: Select all

//! # MMU Configuration settings
//!
//! The actual implementation of the MMU will rely on the following configuration and settings:
//! While setting up the MMU we configure a 4KB granule size. This means at level 1 each page table entry covers
//! a 1GB memory area and has to point to a level 2 descriptor table. Therefore we will cover here the details starting
//! at level 2.
//!
//! Level 1 and Level 2 covering 1GB / 2MB respectively
//! |Table entry type - Bits |63|62 61|60|59 |58 52|51  48|47                     30|29       12|11          |1 0|
//! |------------------------|--|-----|--|---|-----|------|-------------------------|-----------|------------|---|
//! | Table entry            |NS|AP   |XN|PXN|     | RES0 | Next level table address [47..12]   |            |1 1|
//! | Block entry            |  Block attributes   | RES0 | Output address [47..30] | RES0      | Block attr.|0 1|
//!
//! Level 3 does not allow for further table references, this is the memory page level of the desired granule (4KB)
//! |Table entry type - Bits |63|62 61|60|59 |58 52|51  48|47                     30|29       12|11          |1 0|
//! |------------------------|--|-----|--|---|-----|------|-------------------------|-----------|------------|---|
//! | Page entry             |  Page Attributes    | RES0 | Output address [47..12]             | Page attr. |1 1|
//!
//! The upper and lower block/page attributes are the same on each level of the translation tables. They only differ
//! based on the executed translation stage. The different stages are only relevent in case the translation happens
//! within "user level". This means the first translation stage will map the memory into an intermediate physical
//! address, where the second stage will map this IPA into the real physical address. However, the current RusPiRo MMU
//! setup is configured to only use a one stage translation process always immediately resulting in a physical address.
//!
//!  Upper Attributes (Stage 1)
//! |63     59|58     55| 54 | 53  |52 |
//! |---------|---------|----|-----|---|
//! | ignored | ignored | XN | PXN | C |
//!
//! Bits 63..55 are ignored. The difference here is that bit 63..60 may be used by the MMU implementation of the Chip
//! and bit 58..55 may be used by the actual software
//!
//! Bit  | Description
//! -----|-------------
//!  XN  | eXecute Never bit determining whether the memory region is executable or not.
//!  PXN | Priviliged eXecute Never bit determines whether the memory region is executable in EL1. In EL2/EL3 this bit is RES0
//!  C   | Contigues hint bit indicating that this table entry is one of a contigues sets of entries and might be cached
//!      | together with the other ones
//!
//! Lower Attributes (Stage 1)
//! |11|10  |9  8|7  6|5   |4       2|
//! |--|----|----|----|----|---------|
//! |nG| AF | SH | AP | NS | MemAttr |
//!
//! Bit      | Description
//! ---------|-------------
//!  nG      | not Global bit determines whether this entry is globally valid or only for the current ASID value. This  bit is only valid in EL1 & EL0
//!  AF      | Access Flag bit
//!  SH      | Shareability flag
//!  AP      | data Access Permission bits for AP\[2..1\], AP\[0\] is not defined in the TLB entries
//!  NS      | Non-Secure bit specifies whether the output address is in secure or non-secure address map.
//!  MemAttr | Stage 1 memory attributes - index into MAIR_ELx register

tannewt
Posts: 63
Joined: Tue Nov 17, 2020 1:14 am

Re: MMU settings for 1:1 mapping

Mon Sep 27, 2021 6:21 pm

Thanks for the reply Schnoogle! Responses inline. I did end up getting it working. I was missing the Access Flag. I debugged it on my stream here: https://www.youtube.com/watch?v=Cv1HEFcL4Hw
Schnoogle wrote:
Sun Sep 26, 2021 10:08 am
violates this rule as it is configured as block entry and not table entry and thus not pointing to a next level config table.
I believe is only a constraint on level -1 and 0 tables. 0 tables are 512G an entry. 1G entries can be blocks. ttrb is level 1 in this case because of the 35 bit address space.
Schnoogle wrote:
Sun Sep 26, 2021 10:08 am
The second thing I'm spotting is that you skip maintaining level_1_table[1], thus you skip the configuration for the memory region between 1 and 2 GB.

Yup, this was deliberate because nothing is in this range.


Schnoogle wrote:
Sun Sep 26, 2021 10:08 am
The other thing I'm wondering are your TCR_EL2 decisions. Why do you choose T0SZ being 29 (using 36Bit address scheme) ? I'm using here a value of 25 making the phys. address scheme 40Bit. - The different value will have an impact on the TTLB entry structure and how many bits are used for the address and how the address need to be shifted. This register might also miss the IRGN0 and ORGN0 settings?


I was aiming for a 35 bit address space because that's how big the BCM2711 range is.

Schnoogle wrote:
Sun Sep 26, 2021 10:08 am
Why are you setting the TTBR_CNP flag in the TTBR0 register?


I thought this should be set to share the TLBs across cores.

Schnoogle wrote:
Sun Sep 26, 2021 10:08 am
To disable the alignment fault check with active MMU you also need to to clear the respective flag in the SCTLR_EL2 register, I guess it's set to 1 by default.


It didn't appear to be set. It must have been the memory region setting causing the alignment problem.

Schnoogle wrote:
Sun Sep 26, 2021 10:08 am
Another hint: once the MMU has been activated you should let 2 CPU cycles pass and then call tlbi alle2 to ensure the MMU related cache will be invalidated and the new settings are picked up.


Thanks! I'll need to make sure I do this.

Return to “Bare metal, Assembly language”