EDGE COMPUTING & AI
Fig. 2: HBM switch internal connectors.
close. Traffic pattern is a contributor to poor bandwidth. This
comes down to the fact that DRAM requires opening (ACT) and
closing (PRE) rows within a bank, and random accesses require
more maintenance, which prevents data
from transferring during this time. Additionally,
some DRAM architectures (i.e. DDR4,
HBM) have overhead associated with
consecutive accesses to the same Bank
Group. And, short burst of or alternating
read/write data means that the DQ bits are
bi-directional and have a bus turnaround
time associated when switching direction.
In VU+ HBM, there is a hardened AXI
Switch which enables access from any
of the 32 AXI channels to any of the HBM
pseudo channels and addressable memory. There are many advantages
Fig. 3: Example of HBM block design.
to having a hardened switch, such as flexible addressing
and reduction of design complexity and routing congestion.
To enable flexible addressing across the entire HBM stacks the
hardened AXI switch contains switch boxes broken up across
four masters x four slaves.
This facilitates the flexible addressing but there is a limitation
that can impact memory bandwidth. As there are only four horizontal
paths available, the HBM stack can limit your achievable
bandwidth due to arbitration.
How to maximise memory bandwidth
When it comes to maximising memory bandwidth, consider
changing your command and addressing patterns. Since random
accesses and short bursts of read/write transactions result
in the worst bandwidth see if you can alter this on the user application.
This will get you the biggest bang for your buck.
If you’re unable to change your traffic pattern the HBM Memory
Controller IP has several options available that may help:
• Custom Address Mapping: As mentioned previously, random
accesses require higher rates of ACT and PRE commands.
With a custom address map, you can define the AXI addresses
to HBM memory addresses which can increase the
number of page hits and improve bandwidth.
• Bank Group Interleave: Enables sequential address operation
to alternate between even and odd bank groups to maximise
bandwidth efficiency.
• Enable Request Re-Ordering: Enables the controller to reorder
commands (i.e. coalesce commands to reduce bus
turnaround times).
• Enable Close Page Reorder: Enables the controller to close
a page after instruction has completed. If disabled, the page
remains open until a higher priority operation is requested for
another page in the same bank. This can be advantageous
depending on if using a random, linear, or custom addressing
pattern.
• Enable Look Ahead Pre-Charge: Enables controller to reorder
commands to minimise PRE commands.
• Enable Look Ahead Activate: Enables controller to re-order
commands to minimise ACT commands.
• Enable Lookahead Single Bank Refresh: Enables the controller
to insert refresh operations based on pending operations
to maximise efficiency.
• Single Bank Refresh: Instructs the controller to refresh banks
individually instead of all at once.
• Enable Refresh Period Temperature Compensation: This
enables the controller to dynamically adjust the refresh rate
based on the temperature of the memory stacks.
• Hold Off Refresh for Read/Write: This allows the controller to
delay a refresh to permit operations to complete first.
HBM Monitor
New to Vivado is the HBM monitor which, similar to SysMon,
can display the die temperature of each HBM2 die stack
individually. It can also display the bandwidth on a per MC or
Psuedo Channel (PC) basis.
To profile your hardware design and HBM configuration properly
start with the default HBM settings and capture the read/
write throughput as your baseline. Then regenerate new .bit files
using each of and combinations of HBM MC options discussed
earlier to determine which provides the highest throughput.
Note, the way the AXI Switch is configured can also impact
the HBM bandwidth and throughput and should be considered
profiling as well.
If you’re using a pre-existing design and the Vitis tool, you
will need to modify the hardware platform design using a custom
DSA flow.
Design requirements
To profile the HBM bandwidth create or use an existing design
or application. To profile different HBM configurations you will
need access to the hardware design in order to modify the HBM
IP core and then generate new bitstreams and new .xsa/.dsa
files that are used in the Vitis tool for software development.
For background, Vitis is a unified software tool developed
by Xilinx that provides a framework for developing and delivering
FPGA accelerated data centre applications using standard
programming languages and for creating software platforms
targeting embedded processors.
For existing designs refer to Github, the SDAccel example
repositories, the U280 product page and the VCU128 product
page that contains targeted reference designs (TRDs). If you are
targeting a custom platform, or even the U280 or VCU128, and
need to create a custom hardware platform design this can also
be done. Why do I need to create a custom hardware platform
for the Alveo U280 if DSAs already exist?
As workload algorithms evolve, reconfigurable hardware
www.eenewseurope.com eeNews Europe April 2020 News 39
/
/www.eenewseurope.com