INDEX
    Explanations

    ) followed by newline or separator

    New Auto-Interp
    Negative Logits
    на
    1.73
    و
    1.69
    ان
    1.52
    1.52
    en
    1.44
    ار
    1.26
    وها
    1.24
    та
    1.21
    is
    1.20
    k
    1.17
    POSITIVE LOGITS
    0.97
    ς
    0.94
     mengatur
    0.87
     kalangan
    0.87
     mẽ
    0.86
    0.81
     Selle
    0.80
     kelamin
    0.80
     pełni
    0.78
     integrantes
    0.77
    Act Density 0.601%

    No Known Activations