INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     as
    1.37
    س
    1.01
     it
    1.00
    ään
    0.89
    .#
    0.89
     I
    0.87
    rt
    0.86
     aswell
    0.82
     are
    0.79
    :
    0.78
    POSITIVE LOGITS
    ו
    1.19
    ي
    1.13
    ומי
    1.05
    م
    0.96
    ת
    0.95
    К
    0.93
    n
    0.91
    са
    0.89
    D
    0.89
    י
    0.87
    Act Density 0.488%

    No Known Activations