INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     qui
    -0.07
    (q
    -0.07
     gentle
    -0.06
    未必
    -0.06
     Knee
    -0.06
     rinse
    -0.06
    ements
    -0.06
     respectful
    -0.06
    (
    -0.06
     halted
    -0.06
    POSITIVE LOGITS
    0.07
     위치
    0.07
     materiał
    0.07
    0.07
    ayah
    0.07
     @_
    0.06
    _jump
    0.06
    Forg
    0.06
    🎢
    0.06
     Boehner
    0.06
    Act Density 0.048%

    No Known Activations