INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     defam
    2.64
    ना
    2.58
    2.55
    ש
    2.48
    ные
    2.44
     скорее
    2.41
    𝐼
    2.40
    ंत्रिक
    2.40
    ब्दिक
    2.38
     день
    2.38
    POSITIVE LOGITS
     allá
    3.36
     importantly
    3.30
     sinned
    2.93
    2.79
    ي
    2.78
    OVER
    2.74
    kannya
    2.68
    camb
    2.55
    y
    2.53
     lanjut
    2.52
    Act Density 0.392%

    No Known Activations