INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     bully
    -0.08
     prima
    -0.08
     خیال
    -0.08
     laten
    -0.08
     веществ
    -0.08
     Invisalign
    -0.08
     kanker
    -0.08
     australian
    -0.08
     psoriasis
    -0.07
     artr
    -0.07
    POSITIVE LOGITS
    Lua
    0.10
    .lua
    0.09
     Lua
    0.08
    .Nil
    0.08
    pad
    0.08
     lua
    0.08
    J
    0.08
    (lua
    0.08
     jailbreak
    0.08
    技巧
    0.08
    Act Density 0.005%

    No Known Activations