pyptx.smem¶
This page is generated from source docstrings and public symbols.
Shared-memory allocation, addressing, and barrier objects.
This module covers the shared-memory side of handwritten kernels:
smem.allocallocates shared-memory regionssmem.wgmma_tileallocates canonical GMMA/WGMMA operand layoutssmem.mbarrierallocates mbarrier arrays in shared memorysmem.base/smem.load/smem.storeprovide common address and access helpers
Typical usage:
from pyptx import smem
from pyptx.types import bf16
sA = smem.alloc(bf16, (STAGES, BM, BK), swizzle="128B")
bar_full = smem.mbarrier(STAGES)
The design here is deliberately pragmatic: shared-memory regions are described just enough for PTX emission, and some allocations carry extra metadata for higher-level helpers such as WGMMA descriptor synthesis.
Public API¶
SharedAllocSharedSliceMbarrierArrayMbarrierRefbaseloadstoreallocwgmma_tilealloc_with_layoutmbarrierapply_swizzle
SharedAlloc¶
- Kind:
class
class SharedAlloc(name: 'str', dtype: 'PtxType', shape: 'tuple[int, ...]', swizzle: 'str | None', gmma_layout: "'object | None'" = None, gmma_major: 'str | None' = None, byte_offset: 'int' = 0) -> 'None'
Handle to a shared memory allocation.
Indexing with a stage index returns a SharedSlice representing an offset into the allocation, suitable for passing to ptx.cp.async instructions.
gmma_layout is set (non-None) when this alloc was produced by
smem.wgmma_tile — it carries the GmmaLayout needed to
auto-build a wgmma descriptor. The gmma_major field is a string
"K" or "MN" matching the operand orientation.
Members¶
byte_offset¶
-
Kind:
attribute -
Value:
<member 'byte_offset' of 'SharedAlloc' objects>
No docstring yet.
dtype¶
-
Kind:
attribute -
Value:
<member 'dtype' of 'SharedAlloc' objects>
No docstring yet.
gmma_layout¶
-
Kind:
attribute -
Value:
<member 'gmma_layout' of 'SharedAlloc' objects>
No docstring yet.
gmma_major¶
-
Kind:
attribute -
Value:
<member 'gmma_major' of 'SharedAlloc' objects>
No docstring yet.
name¶
-
Kind:
attribute -
Value:
<member 'name' of 'SharedAlloc' objects>
No docstring yet.
shape¶
-
Kind:
attribute -
Value:
<member 'shape' of 'SharedAlloc' objects>
No docstring yet.
swizzle¶
-
Kind:
attribute -
Value:
<member 'swizzle' of 'SharedAlloc' objects>
No docstring yet.
SharedSlice¶
- Kind:
class
A stage-indexed slice of a shared allocation.
Members¶
name¶
- Kind:
property
Underlying shared-memory symbol name for this slice.
alloc¶
-
Kind:
attribute -
Value:
<member 'alloc' of 'SharedSlice' objects>
No docstring yet.
stage¶
-
Kind:
attribute -
Value:
<member 'stage' of 'SharedSlice' objects>
No docstring yet.
MbarrierArray¶
- Kind:
class
Array of mbarrier objects in shared memory.
Indexing returns an MbarrierRef for use in ptx.mbarrier.* calls.
byte_offset is the byte offset within dynamic SMEM (when
force_dynamic_smem is active). Each individual mbarrier is 8
bytes, so MbarrierRef for index i lives at
byte_offset + i * 8.
Members¶
byte_offset¶
-
Kind:
attribute -
Value:
<member 'byte_offset' of 'MbarrierArray' objects>
No docstring yet.
count¶
-
Kind:
attribute -
Value:
<member 'count' of 'MbarrierArray' objects>
No docstring yet.
name¶
-
Kind:
attribute -
Value:
<member 'name' of 'MbarrierArray' objects>
No docstring yet.
MbarrierRef¶
- Kind:
class
Reference to a single mbarrier object.
byte_offset is the byte offset of this specific mbarrier
within dynamic SMEM: array.byte_offset + idx * 8. When the
array was allocated in dynamic SMEM mode (name == "dyn_smem"),
instruction emitters use this offset for addressing.
Members¶
name¶
- Kind:
property
Underlying shared-memory symbol name for this mbarrier array.
byte_offset¶
- Kind:
property
Byte offset of this mbarrier within dynamic SMEM.
array¶
-
Kind:
attribute -
Value:
<member 'array' of 'MbarrierRef' objects>
No docstring yet.
idx¶
-
Kind:
attribute -
Value:
<member 'idx' of 'MbarrierRef' objects>
No docstring yet.
base¶
- Kind:
function
Return a u32 register holding the base address of extern shared memory.
load¶
- Kind:
function
Emit ld.shared.{dtype} and return the loaded register.
store¶
- Kind:
function
Emit st.shared.{dtype}.
alloc¶
- Kind:
function
alloc(dtype: 'PtxType', shape: 'tuple[int, ...] | int', swizzle: 'str | None' = None, align: 'int | None' = None, name: 'str | None' = None) -> 'SharedAlloc'
Allocate shared memory.
Emits: .shared [.align N] .b8 name[bytes];
Args: dtype: Element type (e.g. bf16, f32). shape: Shape as tuple (e.g. (STAGES, BM, BK)) or flat int. swizzle: Swizzle mode string (e.g. '128B'). Metadata only for now. align: Byte alignment. Defaults to 128. name: Variable name. Auto-generated if not given.
Returns: SharedAlloc handle for use in ptx.cp.async and ptx.stmatrix calls.
wgmma_tile¶
- Kind:
function
wgmma_tile(dtype: 'PtxType', shape: 'tuple[int, int]', major: 'str' = 'K', *, align: 'int | None' = None, name: 'str | None' = None) -> 'SharedAlloc'
Allocate a shared-memory tile in the canonical GMMA layout for a wgmma operand.
The user just says "this is a K-major A of shape (M, K)" and pyptx
picks the right swizzle/alignment/layout-metadata automatically.
The returned SharedAlloc carries a gmma_layout attribute
so downstream code (ptx.wgmma.mma_async, ptx.wgmma.auto_descriptor)
can derive the 64-bit descriptor without the user touching LBO,
SBO, or swizzle mode.
Args:
dtype: element type (bf16, f16, tf32, f32).
shape: (M, K) for an A operand when major="K", or
(K, N) for a B operand when major="MN".
major: "K" (row-major MxK for A / col-major KxN for B) or
"MN" (col-major MxK for A / row-major KxN for B).
Returns:
A SharedAlloc with .gmma_layout set.
alloc_with_layout¶
- Kind:
function
alloc_with_layout(dtype: 'PtxType', shape: 'tuple[int, ...] | int', swizzle: 'str | None' = None, align: 'int | None' = None, name: 'str | None' = None, *, gmma_layout: "'object | None'" = None, gmma_major: 'str | None' = None) -> 'SharedAlloc'
Internal: allocate SMEM and attach GMMA layout metadata.
Same as alloc but threads the gmma_layout / gmma_major fields
through to the returned SharedAlloc. Most users should call
wgmma_tile or alloc, not this directly.
mbarrier¶
- Kind:
function
Allocate an array of mbarrier objects in shared memory.
In static mode (default): emits .shared .b64 name[count];
In dynamic mode (force_dynamic_smem): no VarDecl is emitted;
the mbarrier lives at dyn_smem + byte_offset and the name is
set to "dyn_smem" so address helpers emit offset-based
references.
Args: count: Number of mbarrier objects. name: Variable name. Auto-generated if not given.
Returns: MbarrierArray handle for use in ptx.mbarrier.* calls.
apply_swizzle¶
- Kind:
function
Apply GMMA swizzle to a logical byte offset, returning the physical offset.
swizzle is "32B", "64B", "128B", or None/"NONE"
(identity). Emits 3 ALU instructions for non-trivial swizzles.