INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     WIB
    -0.10
     IFR
    -0.09
    lisle
    -0.09
    igraphy
    -0.09
     להג
    -0.09
     Pflege
    -0.09
     sinc
    -0.09
     ITV
    -0.08
    ativer
    -0.08
     Dansk
    -0.08
    POSITIVE LOGITS
    oku
    0.22
    оку
    0.15
    okus
    0.13
    okemon
    0.13
    OK
    0.13
    ohan
    0.13
    ok
    0.12
     ninja
    0.12
    Pokemon
    0.11
     superhero
    0.11
    Act Density 0.004%

    No Known Activations