Skip to content

Specialized ops#7322

Merged
youknowone merged 12 commits intoRustPython:mainfrom
youknowone:specialization2
Mar 4, 2026
Merged

Specialized ops#7322
youknowone merged 12 commits intoRustPython:mainfrom
youknowone:specialization2

Conversation

@youknowone
Copy link
Copy Markdown
Member

@youknowone youknowone commented Mar 3, 2026

Summary by CodeRabbit

  • Performance & Stability

    • Faster instruction execution via expanded adaptive specialization and improved inline caches.
    • Better thread-safety and stronger memory ordering for reliable concurrent execution and cache visibility.
    • Cache invalidation timing adjusted to reduce stale specializations.
  • Refactor

    • Centralized specialization/deoptimization lifecycle for consistent runtime behavior.
    • JIT cache moved to mutex-guarded storage for safer concurrent updates.
  • Behavior

    • Slot lookup now distinguishes native vs Python-level results for more accurate dispatch.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

This PR makes bytecode slots atomic for concurrent specialization, centralizes adaptive inline-caching and deoptimization flows, moves JIT cache storage to a mutex-protected optional, strengthens several atomic ordering choices across types/dicts, and distinguishes Python-level methods from native slots in slot lookup/update logic.

Changes

Cohort / File(s) Summary
Atomic Bytecode Operations
crates/compiler-core/src/bytecode.rs
Replaced non-atomic in-place opcode/arg/cache/adaptive-counter mutability with per-slot atomics (AtomicU8/AtomicU16). Added compare_exchange_op, read_op, read_arg, read_cache_u16/write_cache_u16, and adaptive counter read/write methods; updated serialization/quicken to use atomic reads; unsafe impl Sync for CodeUnits.
Adaptive Inline Caching & Deoptimization
crates/vm/src/frame.rs
Large refactor centralizing specialization lifecycle: added adaptive(), commit_specialization(), deoptimize()/deoptimize_at(), unified adaptive counters and commit paths, many specialized handlers (binary int/float, LoadAttr variants, Call optimizations, ForIter variants, ToBool, CompareOp, StoreSubscr, ContainsOp, etc.), and consolidated guards/version checks.
JIT Cache Storage
crates/vm/src/builtins/function.rs
Replaced OnceCell<CompiledCode> with PyMutex<Option<CompiledCode>> for jitted_code. Reworked all accesses to lock-based reads/writes and added cache invalidation on __code__ setter and in __jit__.
Slot Lookup Result Distinction
crates/vm/src/types/slot.rs
Introduced SlotLookupResult<T> (NativeSlot, PythonMethod, NotFound) and updated lookup_slot_in_mro() and all slot update sites/macros to branch on these outcomes, ensuring Python-level methods yield wrappers and MRO inheritance when not found.
Atomic Ordering Adjustments
crates/vm/src/dict_inner.rs, crates/vm/src/builtins/type.rs
Strengthened atomic orderings: dict version loads use Acquire and bumps use Release; PyType::modified() now uses SeqCst. SetAttr now invalidates type version before mutation.
Stack Analysis Minor Change
crates/vm/src/builtins/frame.rs
Stack analysis now calls deoptimize() on the resolved base opcode (after to_base()), ensuring de-specialization is applied during stack effect analysis.

Sequence Diagram(s)

sequenceDiagram
    participant Frame as ExecutingFrame
    participant CodeUnits as Code Storage
    participant Adaptive as Adaptive Engine
    participant Specialized as Specialized Handler
    participant Base as Base Handler

    Frame->>CodeUnits: read_op(index) (Acquire)
    CodeUnits-->>Frame: opcode
    Frame->>Adaptive: check adaptive counter & candidate (Relaxed)
    Adaptive->>Adaptive: increment counter (Relaxed)
    Adaptive->>Specialized: attempt specialization
    alt Success
        Specialized->>CodeUnits: compare_exchange_op(index, expected, new) (Release)
        CodeUnits-->>Specialized: CAS success
        Specialized-->>Frame: execute specialized path
    else Fail
        Specialized->>CodeUnits: write base opcode (Release)
        CodeUnits-->>Base: base opcode written
        Specialized-->>Frame: fall back to base handler
    end
Loading
sequenceDiagram
    participant Setter as PyType SetAttr
    participant Version as Version Tag
    participant Readers as Concurrent Readers

    Setter->>Version: modified() (SeqCst store)
    Version->>Readers: readers load version (Acquire)
    Setter->>Setter: perform attribute insert/remove
    Setter-->>Readers: changes ordered via version operations
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • ShaharNaveh

🐰 Hops excitedly
Atoms guard the opcode lane,
Slots now tell if Python reign—
Caches nudge their version flag,
Specializers leap, then sag.
Hooray—deopt, commit, and play!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Specialized ops' is vague and generic, using a non-descriptive term that fails to convey meaningful information about the specific changes in this substantial multi-file refactoring. Consider a more descriptive title that captures the main objectives, such as 'Implement adaptive specialization framework for bytecode operations' or 'Refactor to atomic-based synchronization with specialized opcode paths'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.