INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     I'll
    -0.07
     Foo
    -0.07
     frais
    -0.07
    iverr
    -0.07
     ´
    -0.07
     val
    -0.07
     convenience
    -0.07
     brightness
    -0.07
     inputs
    -0.07
     fashions
    -0.07
    POSITIVE LOGITS
    监察
    0.10
    は禁止
    0.10
     derog
    0.09
     prohibited
    0.09
     banned
    0.09
     запрещ
    0.09
    Forbidden
    0.08
     verboten
    0.08
     опас
    0.08
     rejecting
    0.08
    Act Density 0.045%

    No Known Activations