First Kernel¶
The smallest useful pyptx workflow is:
- define a kernel
- allocate registers
- emit a few PTX instructions
- inspect the generated PTX
Example¶
from pyptx import kernel, reg, ptx
from pyptx.types import u32
@kernel(arch="sm_90a")
def tiny():
tid = reg.from_(ptx.special.tid.x(), u32)
out = tid + 4
ptx.inst.mov.u32(tid, out)
ptx.ret()
Then inspect it:
The emitted PTX is small and unsurprising:
What To Notice¶
reg.from_(...)is just a convenient way to stage a value into a real PTX registertid + 4emits arithmetic directly into the traceptx.ret()ends the kernel exactly where you write it
This is the core authoring style of pyptx: structured Python, explicit PTX model.
The Three Namespaces¶
Most kernels are some mix of:
reg: allocate state and use simple register-level arithmeticsmem: allocate shared-memory regions and barriersptx: emit instructions, control flow, and low-level GPU operations
If you understand those three namespaces, you understand the shape of the library.
Next Step¶
The next thing to read after this page should usually be:
- PTX Namespace to understand the main DSL surface
- JAX Runtime or Torch Runtime to see how kernels are launched