e42.uk Circle Device

 

Quick Reference

llama.cpp on Strix Halo

llama.cpp on AMD Ryzen AI Max+ 395 w/Radeon 8060S

Configure the iGPU memory in Advanced to be Auto and iGPU Memory Size to be 0.5GB. This will allow the ROCm software to manage the memory split.

Using Artix Linux with OpenRC with the rocm-hip-sdk installed.

pacman -S rocm-hip-sdk

Get llama.cpp from github:

git clone --depth=1 https://github.com/ggml-org/llama.cpp

Build with cmake:

cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1151 -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -- -j$(nproc)

Run with a model:

GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 build/bin/llama-server --host 0.0.0.0 \
    --port 8080 \
    --flash-attn on \
    --cache-prompt \
    --cache-type-k q8_0 \
    --cache-type-v q8_0 \
    --temp 0.6 \
    --top-p 0.95 \
    --top-k 20 \
    --min-p 0.0 \
    --presence-penalty 0.0 \
    --repeat-penalty 1.0 \
    --gpu-layers 99 \
    --ctx-size 32768 \
    --mmproj ../models/Huihui-Qwen3.6-35B-A3B-abliterated-mmproj-BF16.gguf \
    --model ../models/Huihui-Qwen3.6-35B-A3B-abliterated-Q8_0.gguf

References

Quick Links: Techie Stuff | General | Personal | Quick Reference