INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    1.18
    ла
    1.13
    на
    1.00
    י
    1.00
    ar
    0.91
    ו
    0.90
    ع
    0.89
    م
    0.87
    u
    0.80
    in
    0.79
    POSITIVE LOGITS
     
    1.41
     a
    0.86
    Х
    0.85
    0.85
    0.84
    Очень
    0.79
    rta
    0.78
    Лю
    0.75
    .
    0.75
    ್ಟ
    0.73
    Act Density 0.000%

    No Known Activations