INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     من
    1.41
     في
    1.37
     وال
    1.29
     и
    1.28
     на
    1.28
     ل
    1.27
     ו
    1.26
     لل
    1.24
    1.23
     وم
    1.23
    POSITIVE LOGITS
     own
    1.01
     Own
    0.69
     eigenen
    0.66
     latter
    0.64
     OWN
    0.64
    konen
    0.63
    SELF
    0.62
     talent
    0.61
     ſ
    0.60
    优秀
    0.59
    Act Density 0.000%

    No Known Activations