INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    are
    1.37
    いない
    1.13
    ot
    1.10
    ج
    1.10
    1
    1.08
     evit
    1.05
    違う
    1.03
    int
    1.02
    ignores
    1.02
     bashing
    1.02
    POSITIVE LOGITS
     کدام
    1.12
    отя
    1.05
    лла
    1.04
    таў
    1.03
     Люди
    1.02
     therapeut
    1.01
     Ủy
    0.99
     σημ
    0.97
    icación
    0.95
    gruppe
    0.94
    Act Density 0.001%

    No Known Activations