ARM Cortex A53
==============

Some personal architectural notes on the A53.


ARM specific info
-----------------
source: Arm Cortex A53 MPCore Processor Technical Reference Manual

MMU roles:
 * controls table walk hardware
 * translates addresses, virtual->physical

MMU configuration and management happens through system control registers.
 see section 4.1

ASID - Address Spece IDentifier
 MMU uses an ASID to distinguish, within a TLB (see down),
 between memory pages having the same virtual address.
 Assigned by the OS.


= Privileges:
source: https://developer.arm.com/documentation/102412/0102/Privilege-and-Exception-levels

- Execution States: AArch32 and AArch64.

- Exception Levels
  example EL3 (firmware), EL2 (hypervisor) EL1 (kernel), EL0 (application)
  EL3: can change Security State (see below)
  EL2: can handle virtualization feats

- Security State
  Being in a non-secure state limits the access to {address space},
  {system registers} and {interrupts}.
  Being in a secure state opens up additional resources of the classes above,
  besides those available in non-secure state.
  Realm/Root (RME - Realm Management Extension, see later)
  EL3 has a fixed Security State, privilged (e.g. Secure State)

- Exceptions: synchronous and asynchronous.
  - Synchronous
      Served immediately (e.g. MMU permission fail, or special instructions to
      change exception level).
  - Asynchronous
      Can be temporarily masked, they are required to be served in a "finite time".
    - IRQ
    - FIQ (fast interrupt request, used to be high prio)
    - NMI (not maskable interrupts)
    - SError (system errors, internal of CPU, e.g. bus error)
    - V(irtual) {IRQ,FIQ,SError)

= Memory Management Guide
source: https://developer.arm.com/documentation/101811/0102/?lang=en

The virtual address is handled by the TLB (see below) within the MMU.
The address must be translated before the cache lookup (physically tagged).

Multi-level table: the lookup is hierarchical (as in generic TLB page walk,
see below).
Bounded number of levels (e.g. ARMv8-A -> 4 levels max).

The OS decides how to organise the tables (e.g. larger blocks = short walks,
smaller blocks = finer control, but longer walks).

"Translation regimes":
Each item of the {Exception Level} x {Security State} matrix has its own
virtual address space (settings and tables).
e.g. NS.EL2:0x8000 => non-secure state, exception level 2, address 0x8000.


= Trustzone
source: https://developer.arm.com/architectures/learn-the-architecture/trustzone-for-aarch64

= Realm Management
source: https://developer.arm.com/documentation/den0126/latest

= Memory Management Guide


Special interest info
---------------------

= MMU
Configuration via system registers
Faulty exception checks cause synchronous exceptions
 [?] implied: regular and permitted access is handled transparently

= Data Caching


Generic info
------------

= TLB Translation Lookaside Buffer

Caches the information needed for the onversion of virtual pages to
physical pages.
The address has the virtual part of the address identifying the page, so
it is replaced by a lookup.
TLB has a fixed size, so it is subject to cache miss.
In case of miss, the system refers to the page table in physical memory.



= Cache Policies
source: https://en.wikipedia.org/wiki/Cache_placement_policies

== Direct mapped
Single line per set: each memory block can occupy a single line.
Cheap, fast, but low hit rate (conflicts results in content being
replaced).

== Fully associative
Single set with multiple lines.  Each memory block can be anywhere
in the set, and iteration is needed.
Cheap but slow due to iteration.

== N ways associative cache
N-way set associative: provide N blocks in each set.
This reduces the likelihood of a cache miss of a factor N,
while also doubling the size of the cache.
Direct mapped corresponds to 1-way associative.

= Cache indexing and tagging

{P,V}I{P,V}T
Indexing determines the cache set,
tagging determines what line of the set contains the data.
They can be Physical or Virtual, with various pros and cons.

= Page walk
source: https://cs.stackexchange.com/questions/102834/what-is-happening-during-table-walk

With a 64 bits architecture, a 4k page size (12 bits of displacement)
gives place to a lookup table that has an insane number of entries:
2^(64 - 12)
Using a multi-step page table allows to fit it in memory, but implies
multiple memory accesses (hence the term "walk").
Walks are reduced by the use of a TLB, which however may have cache
misses.


Useful references
-----------------

FIQ (Fast Interrupt reQuest) vs IRQ
A matter of priority, FIQs can interrupt IRQs.

https://stackoverflow.com/questions/973933/what-is-the-difference-between-fiq-and-irq-interrupt-system/14212234#14212234