INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     huge
    -0.07
     Joh
    -0.07
     Central
    -0.07
    推介
    -0.07
     Such
    -0.07
     Aviv
    -0.07
     escalated
    -0.07
    xAA
    -0.06
    -0.06
     rad
    -0.06
    POSITIVE LOGITS
    _declaration
    0.09
    Ingredient
    0.08
     mim
    0.07
     rebellion
    0.06
     بواسطة
    0.06
     Orbit
    0.06
    posición
    0.06
    çu
    0.06
    Linear
    0.06
    -len
    0.06
    Act Density 0.011%

    No Known Activations