INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ould
    -0.08
    NEG
    -0.08
     auf
    -0.07
    pent
    -0.07
     истор
    -0.07
    каж
    -0.07
    -0.06
    avl
    -0.06
     cock
    -0.06
    pet
    -0.06
    POSITIVE LOGITS
    _SERVICE
    0.06
     result
    0.06
     Helper
    0.06
    /change
    0.06
     하고
    0.06
     باشید
    0.06
    -G
    0.06
     Performs
    0.06
     سپس
    0.06
     Currently
    0.06
    Act Density 0.009%

    No Known Activations