INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ب
    1.56
    r
    1.54
    مي
    1.49
    ли
    1.37
    ל
    1.36
    و
    1.35
    .!
    1.22
    .
    1.21
    ar
    1.17
    l
    1.17
    POSITIVE LOGITS
    得上
    1.45
    ━━
    1.38
     arguably
    1.30
    1.30
    始终
    1.24
    1.24
    য়ের
    1.23
     besten
    1.23
    1.22
    ющим
    1.21
    Act Density 0.068%

    No Known Activations