INDEX
    Explanations

    agreement or exclamation

    New Auto-Interp
    Negative Logits
    م
    1.43
    1.38
    ре
    1.35
    1.30
    пи
    1.23
    وم
    1.23
    ри
    1.19
    ну
    1.16
    력이
    1.16
    C
    1.16
    POSITIVE LOGITS
    .
    1.88
    in
    1.65
    (
    1.09
     fundament
    1.07
    inę
    1.06
    不喜欢
    1.05
    inah
    1.05
     stride
    1.00
    ashed
    0.98
    inni
    0.96
    Act Density 0.230%

    No Known Activations