INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    s
    1.23
    ו
    1.18
    2
    1.17
    ك
    1.17
    ل
    1.15
    1.15
    ro
    1.07
    ra
    1.05
    с
    1.05
    ל
    1.02
    POSITIVE LOGITS
    0.99
    0.94
     a
    0.92
    ্ড
    0.91
     Victoria
    0.89
    0.86
    Kr
    0.85
    Victoria
    0.83
     Đi
    0.83
     Agustus
    0.82
    Act Density 0.002%

    No Known Activations