INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    t
    1.42
    ↵↵
    1.13
     
    1.13
    '
    1.10
    d
    1.00
    \
    1.00
    0.93
     joka
    0.91
    dan
    0.89
     as
    0.86
    POSITIVE LOGITS
    ین
    1.45
    on
    1.34
    1.33
    К
    1.23
    ج
    1.23
    ای
    1.21
    up
    1.19
    ویز
    1.18
    1.18
    میم
    1.17
    Act Density 0.053%

    No Known Activations