llama.cpp on AMD Ryzen AI Max+ 395 w/Radeon 8060S
Configure the iGPU memory in Advanced to be Auto and iGPU Memory Size to
be 0.5GB. This will allow the ROCm software to manage the memory split.
Using Artix Linux with OpenRC with the rocm-hip-sdk installed.
pacman -S rocm-hip-sdk
Get llama.cpp from github:
git clone --depth=1 https://github.com/ggml-org/llama.cpp
Build with cmake:
cmake -S . -B build -DGGML_HIP=ON -DAMDGPU_TARGETS=gfx1151 -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -- -j$(nproc)
Run with a model:
GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 build/bin/llama-server --host 0.0.0.0 \
--port 8080 \
--flash-attn on \
--cache-prompt \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--temp 0.6 \
--top-p 0.95 \
--top-k 20 \
--min-p 0.0 \
--presence-penalty 0.0 \
--repeat-penalty 1.0 \
--gpu-layers 99 \
--ctx-size 32768 \
--mmproj ../models/Huihui-Qwen3.6-35B-A3B-abliterated-mmproj-BF16.gguf \
--model ../models/Huihui-Qwen3.6-35B-A3B-abliterated-Q8_0.gguf
