INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .har
    -0.08
     activations
    -0.07
    -0.07
    -0.07
     launcher
    -0.07
     cod
    -0.07
    (Base
    -0.07
     pid
    -0.07
     SAX
    -0.07
    .as
    -0.07
    POSITIVE LOGITS
     Pistol
    0.08
    .stopPropagation
    0.08
     кнопк
    0.07
    0.07
    0.07
    0.07
    🧔
    0.07
    𐤔
    0.07
     использ
    0.07
    0.06
    Act Density 0.008%

    No Known Activations