AMD vis-à-vis GPUs #2
Installing ROCm & PyTorch on Fedora 43 natively
CUDA needs users to be using Nvidia’s proprietary repository for drivers. Nvidia’s proprietary drivers run as dkms kernel modules, and they historically don’t play nice with DNF’s grand plans for package management. Adding the cuda-fedora repository further complicates this fragile software stack. This makes life difficult for those who like kernel updates. For all the universal adoption Nvidia observes, their insistence on proprietary drivers makes the developer experience a bit questionable.
On Team Red, GPU drivers are referred to as amdgpu and they’re supported out of the box on the Linux kernel. AMD ships their compute backend as amd-kfd (Kernel Fusion Driver) which is included in the amdgpu project. Contrasting Nvidia’s approach, Fedora 43 repos include a ROCm metapackage, which downloads the entire ~12GB ROCm stack from a single dnf install. Compared to the arduous task of setting up CUDA on Fedora, it feels like a breeze to use DNF for everything.
The metapackage pulls several dependencies, notably:
#toolchain
rocm-core
#compiler
rocm-llvm
rocm-clang
rocm-lld
rocm-libc++
rocm-omp
#hip
rocm-hip
hipcc
hipblas
hipsparse
#compute
rocblas
rccl
#ml
miopen
mivisionx
#utils
rocm-opencl
rocm-clinfo
rocm-smi
rocminfo
amdsmiAfter installation, rocminfo indicates that our specific Vega 7 is a gfx90c. Don’t forget this codename (ominous foreshadowing). rocminfo also confirms that XNACK support is enabled. XNACK ensures system stability while our DDR4 system UMA masquerades as VRAM.
Checking logs from rocminfo, dmesg, rocm-smi, it’s all promising. Darktable is able to use ROCm. Let’s try running a simple vector-add HIP program in C++:
$ cat t.cpp
#include <hip/hip_runtime.h>
#include <stdio.h>
__global__ void kernel(float* a) {
a[0] += 1.0f;
}
int main() {
float *d;
float h = 1.0f;
hipMalloc(&d, sizeof(float));
hipMemcpy(d, &h, sizeof(float), hipMemcpyHostToDevice);
kernel<<<1, 1>>>(d);
hipDeviceSynchronize();
hipMemcpy(&h, d, sizeof(float), hipMemcpyDeviceToHost);
printf(”%f\n”, h);
hipFree(d);
return 0;
}
$ hipcc t.cpp -O2 -o t && ./t
4 warnings generated when compiling for host.
2.000000Simple enough! ROCm works :)
My hypothesis is:
As long as we can run HIP/BLAS code on system ROCm using hipcc, hipblas or rocblas, we can get PyTorch working.
So, let’s try installing PyTorch. PyTorch stable currently has first party support for ROCm 7.1. Our system is running ROCm 6.4.4 though. So we should probably avoid installing a ROCm version newer than that.
What do AMD docs recommend? Unfortunately, they only cover Dockerised PyTorch. And they explicitly state that their entire stack is only tested against Ubuntu/RHEL targets. I’m not planning to switch to Ubuntu/RHEL anytime soon, and needing Docker just to run PyTorch seems inconvenient.
Let’s check PyTorch docs again. The release archive shows that there’s a PyTorch 2.8.0 build compatible with ROCm 6.4. Running the pip install command shows:
Installed 15 packages in 1.12s
+ filelock==3.20.0
+ fsspec==2025.12.0
+ jinja2==3.1.6
+ markupsafe==2.1.5
+ mpmath==1.3.0
+ networkx==3.6.1
+ numpy==2.3.5
+ pillow==12.0.0
+ pytorch-triton-rocm==3.4.0
+ setuptools==70.2.0
+ sympy==1.14.0
+ torch==2.8.0+rocm6.4
+ torchaudio==2.8.0+rocm6.4
+ torchvision==0.23.0+rocm6.4
+ typing-extensions==4.15.0Smooth sailing? Let’s inspect the installed libs:
lib64/python3.12/site-packages/torch $ find . -type f -name "*.so*" | grep -i roc
./lib/librocprofiler-register.so
./lib/librocblas.so
./lib/libroctracer64.so
./lib/libroctx64.so
./lib/librocm_smi64.so
./lib/rocblas/library/Kernels.so-000-gfx1101.hsaco
./lib/rocblas/library/Kernels.so-000-gfx942-xnack+.hsaco
./lib/rocblas/library/Kernels.so-000-gfx90a-xnack+.hsaco
./lib/rocblas/library/Kernels.so-000-gfx1030.hsaco
./lib/rocblas/library/Kernels.so-000-gfx942-xnack-.hsaco
./lib/rocblas/library/Kernels.so-000-gfx908.hsaco
./lib/rocblas/library/Kernels.so-000-gfx90a-xnack-.hsaco
./lib/rocblas/library/Kernels.so-000-gfx1200.hsaco
./lib/rocblas/library/Kernels.so-000-gfx1102.hsaco
./lib/rocblas/library/Kernels.so-000-gfx1100.hsaco
./lib/rocblas/library/Kernels.so-000-gfx1201.hsaco
./lib/librocrand.so
./lib/librocsparse.so
./lib/librocm-core.so
./lib/librocfft.so
./lib/librocsolver.soThere’s two super fun things you can see here. First, PyTorch installs its own ROCm. Duplicating system libraries is a questionable design choice by itself. Secondly, remember gfx90c? Yeah… its not included in the PyTorch ROCblas.
sigh.
What’s going on here?
AMD first started upstreaming gfx90c commits to LLVM back in 2021. But historically, they never shipped gfx90c kernels in ROCm. The linked GitHub issue includes a community workaround forcing ROCm to spoof gfx900.
So what’s a gfx900? Its a Radeon Instinct MI25 which utilises Vega 10 (GCN 5.0) with 16GB HBM2 and 64 CUs. Close enough. My laptop does indeed have a 300 watt HPC iGPU in it.
To summarise, the options discovered till now for using PyTorch on the Thinkpad are:
Spoof gfx900 on ROCm 6.4 (wait does ROCm 6.4 include gfx900 kernels..?)
Try ‘PyTorch recommended’ ROCm 7.1 with system ROCm at 6.4.4 since PyTorch installs its own userspace ROCm libs
Use Dockerised PyTorch packaged by AMD
Switch to Ubuntu/RHEL
We can always compile PyTorch from source.
Spoof a Nvidia GPU, thereby defeating the purpose.
Use cpu-torch (never)
I’ll be focusing on gfx900 spoofing experiments for post #3. See you in the next one!
Glossary
amdgpu: The open-source Linux kernel driver for AMD Radeon GPUs, providing graphics and compute support.
amd-kfd (Kernel Fusion Driver): AMD’s kernel component enabling heterogeneous CPU–GPU compute under ROCm.
BLAS (Basic Linear Algebra Subprograms): A standard API for efficient vector and matrix operations used in scientific computing.
CUDA (Compute Unified Device Architecture): Nvidia’s proprietary GPU programming platform and runtime.
DKMS (Dynamic Kernel Module Support): A system that automatically rebuilds third-party kernel modules when the Linux kernel updates.
DNF: A package manager used by Fedora and related Linux distributions.
gfx*: AMD GPU architecture identifiers used by LLVM/ROCm to target specific hardware generations (e.g., gfx900, gfx90a).
HIP (Heterogeneous-Compute Interface for Portability): AMD’s C++ runtime and API for writing GPU-accelerated programs, conceptually similar to CUDA.
hipcc: The compiler driver used to build HIP-based GPU applications.
LLVM (Low Level Virtual Machine): A modular compiler infrastructure used to build toolchains for CPUs and GPUs.
MIOpen: AMD’s GPU-accelerated deep learning primitives library.
RCCL (Radeon Collective Communications Library): AMD’s library for high-performance multi-GPU communication.
rocBLAS: AMD’s GPU-accelerated implementation of the BLAS standard.
UMA (Unified Memory Architecture): A system design where the CPU and GPU share the same physical memory.
XNACK: A GPU memory feature that enables retrying faulted memory accesses, improving reliability in certain configurations.