INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ни
    1.63
    ва
    1.44
    ви
    1.43
    Waar
    1.43
     fillet
    1.42
    سى
    1.41
     marito
    1.41
    ला
    1.40
    ul
    1.38
    ل
    1.32
    POSITIVE LOGITS
    و
    1.86
    ה
    1.76
    اد
    1.45
    ات
    1.41
     adware
    1.40
     Clubhouse
    1.34
    ية
    1.33
     indag
    1.33
    น้อง
    1.32
     Luffy
    1.30
    Act Density 0.294%

    No Known Activations