Skip to content

pyptx.smem

This page is generated from source docstrings and public symbols.

Shared-memory allocation, addressing, and barrier objects.

This module covers the shared-memory side of handwritten kernels:

  • smem.alloc allocates shared-memory regions
  • smem.wgmma_tile allocates canonical GMMA/WGMMA operand layouts
  • smem.mbarrier allocates mbarrier arrays in shared memory
  • smem.base / smem.load / smem.store provide common address and access helpers

Typical usage:

from pyptx import smem
from pyptx.types import bf16

sA = smem.alloc(bf16, (STAGES, BM, BK), swizzle="128B")
bar_full = smem.mbarrier(STAGES)

The design here is deliberately pragmatic: shared-memory regions are described just enough for PTX emission, and some allocations carry extra metadata for higher-level helpers such as WGMMA descriptor synthesis.

Public API

SharedAlloc

  • Kind: class
class SharedAlloc(name: 'str', dtype: 'PtxType', shape: 'tuple[int, ...]', swizzle: 'str | None', gmma_layout: "'object | None'" = None, gmma_major: 'str | None' = None, byte_offset: 'int' = 0) -> 'None'

Handle to a shared memory allocation.

Indexing with a stage index returns a SharedSlice representing an offset into the allocation, suitable for passing to ptx.cp.async instructions.

gmma_layout is set (non-None) when this alloc was produced by smem.wgmma_tile — it carries the GmmaLayout needed to auto-build a wgmma descriptor. The gmma_major field is a string "K" or "MN" matching the operand orientation.

Members

byte_offset

  • Kind: attribute

  • Value: <member 'byte_offset' of 'SharedAlloc' objects>

No docstring yet.

dtype

  • Kind: attribute

  • Value: <member 'dtype' of 'SharedAlloc' objects>

No docstring yet.

gmma_layout

  • Kind: attribute

  • Value: <member 'gmma_layout' of 'SharedAlloc' objects>

No docstring yet.

gmma_major

  • Kind: attribute

  • Value: <member 'gmma_major' of 'SharedAlloc' objects>

No docstring yet.

name

  • Kind: attribute

  • Value: <member 'name' of 'SharedAlloc' objects>

No docstring yet.

shape

  • Kind: attribute

  • Value: <member 'shape' of 'SharedAlloc' objects>

No docstring yet.

swizzle

  • Kind: attribute

  • Value: <member 'swizzle' of 'SharedAlloc' objects>

No docstring yet.

SharedSlice

  • Kind: class
class SharedSlice(alloc: 'SharedAlloc', stage: 'int') -> 'None'

A stage-indexed slice of a shared allocation.

Members

name

  • Kind: property

Underlying shared-memory symbol name for this slice.

alloc

  • Kind: attribute

  • Value: <member 'alloc' of 'SharedSlice' objects>

No docstring yet.

stage

  • Kind: attribute

  • Value: <member 'stage' of 'SharedSlice' objects>

No docstring yet.

MbarrierArray

  • Kind: class
class MbarrierArray(name: 'str', count: 'int', byte_offset: 'int' = 0) -> 'None'

Array of mbarrier objects in shared memory.

Indexing returns an MbarrierRef for use in ptx.mbarrier.* calls.

byte_offset is the byte offset within dynamic SMEM (when force_dynamic_smem is active). Each individual mbarrier is 8 bytes, so MbarrierRef for index i lives at byte_offset + i * 8.

Members

byte_offset

  • Kind: attribute

  • Value: <member 'byte_offset' of 'MbarrierArray' objects>

No docstring yet.

count

  • Kind: attribute

  • Value: <member 'count' of 'MbarrierArray' objects>

No docstring yet.

name

  • Kind: attribute

  • Value: <member 'name' of 'MbarrierArray' objects>

No docstring yet.

MbarrierRef

  • Kind: class
class MbarrierRef(array: 'MbarrierArray', idx: 'int') -> 'None'

Reference to a single mbarrier object.

byte_offset is the byte offset of this specific mbarrier within dynamic SMEM: array.byte_offset + idx * 8. When the array was allocated in dynamic SMEM mode (name == "dyn_smem"), instruction emitters use this offset for addressing.

Members

name

  • Kind: property

Underlying shared-memory symbol name for this mbarrier array.

byte_offset

  • Kind: property

Byte offset of this mbarrier within dynamic SMEM.

array

  • Kind: attribute

  • Value: <member 'array' of 'MbarrierRef' objects>

No docstring yet.

idx

  • Kind: attribute

  • Value: <member 'idx' of 'MbarrierRef' objects>

No docstring yet.

base

  • Kind: function
base(name: 'str | None' = None)

Return a u32 register holding the base address of extern shared memory.

load

  • Kind: function
load(dtype: 'PtxType', address)

Emit ld.shared.{dtype} and return the loaded register.

store

  • Kind: function
store(dtype: 'PtxType', address, value) -> 'None'

Emit st.shared.{dtype}.

alloc

  • Kind: function
alloc(dtype: 'PtxType', shape: 'tuple[int, ...] | int', swizzle: 'str | None' = None, align: 'int | None' = None, name: 'str | None' = None) -> 'SharedAlloc'

Allocate shared memory.

Emits: .shared [.align N] .b8 name[bytes];

Args: dtype: Element type (e.g. bf16, f32). shape: Shape as tuple (e.g. (STAGES, BM, BK)) or flat int. swizzle: Swizzle mode string (e.g. '128B'). Metadata only for now. align: Byte alignment. Defaults to 128. name: Variable name. Auto-generated if not given.

Returns: SharedAlloc handle for use in ptx.cp.async and ptx.stmatrix calls.

wgmma_tile

  • Kind: function
wgmma_tile(dtype: 'PtxType', shape: 'tuple[int, int]', major: 'str' = 'K', *, align: 'int | None' = None, name: 'str | None' = None) -> 'SharedAlloc'

Allocate a shared-memory tile in the canonical GMMA layout for a wgmma operand.

The user just says "this is a K-major A of shape (M, K)" and pyptx picks the right swizzle/alignment/layout-metadata automatically. The returned SharedAlloc carries a gmma_layout attribute so downstream code (ptx.wgmma.mma_async, ptx.wgmma.auto_descriptor) can derive the 64-bit descriptor without the user touching LBO, SBO, or swizzle mode.

Args: dtype: element type (bf16, f16, tf32, f32). shape: (M, K) for an A operand when major="K", or (K, N) for a B operand when major="MN". major: "K" (row-major MxK for A / col-major KxN for B) or "MN" (col-major MxK for A / row-major KxN for B).

Returns: A SharedAlloc with .gmma_layout set.

alloc_with_layout

  • Kind: function
alloc_with_layout(dtype: 'PtxType', shape: 'tuple[int, ...] | int', swizzle: 'str | None' = None, align: 'int | None' = None, name: 'str | None' = None, *, gmma_layout: "'object | None'" = None, gmma_major: 'str | None' = None) -> 'SharedAlloc'

Internal: allocate SMEM and attach GMMA layout metadata.

Same as alloc but threads the gmma_layout / gmma_major fields through to the returned SharedAlloc. Most users should call wgmma_tile or alloc, not this directly.

mbarrier

  • Kind: function
mbarrier(count: 'int', name: 'str | None' = None) -> 'MbarrierArray'

Allocate an array of mbarrier objects in shared memory.

In static mode (default): emits .shared .b64 name[count]; In dynamic mode (force_dynamic_smem): no VarDecl is emitted; the mbarrier lives at dyn_smem + byte_offset and the name is set to "dyn_smem" so address helpers emit offset-based references.

Args: count: Number of mbarrier objects. name: Variable name. Auto-generated if not given.

Returns: MbarrierArray handle for use in ptx.mbarrier.* calls.

apply_swizzle

  • Kind: function
apply_swizzle(logical_offset: "'Reg'", swizzle: 'str | None') -> "'Reg'"

Apply GMMA swizzle to a logical byte offset, returning the physical offset.

swizzle is "32B", "64B", "128B", or None/"NONE" (identity). Emits 3 ALU instructions for non-trivial swizzles.