INDEX
    Explanations

    The neuron fires on any token containing “wheel” (e.g. wheel, wheels, wheelchair), effectively detecting mentions of wheel-related terms.

    New Auto-Interp
    Negative Logits
    ']);
    ↵
    -0.07
    097
    -0.07
    _supp
    -0.07
    cstdio
    -0.06
     publik
    -0.06
    spa
    -0.06
     Tart
    -0.06
    _study
    -0.06
     osp
    -0.06
     smb
    -0.06
    POSITIVE LOGITS
     wheel
    0.15
     Wheel
    0.14
    Wheel
    0.14
    wheel
    0.13
     Wheeler
    0.12
     wheels
    0.12
     Wheels
    0.09
    heel
    0.09
     wheelchair
    0.09
    EL
    0.09
    Act Density 0.010%

    No Known Activations