INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    acam
    -0.07
     ptr
    -0.07
    vell
    -0.06
     worse
    -0.06
     uncomfort
    -0.06
    tu
    -0.06
     slipped
    -0.06
    _DESCRIPTION
    -0.06
    діл
    -0.06
     Hoff
    -0.06
    POSITIVE LOGITS
     Rage
    0.07
    ่ว
    0.07
    0.07
    _rg
    0.06
     کرده
    0.06
     CIS
    0.06
     خانو
    0.06
    	en
    0.06
    CLUSION
    0.06
    จะม
    0.06
    Act Density 0.008%

    No Known Activations