INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    4
    1.57
    6
    1.57
    5
    1.52
    ok
    1.50
    0
    1.50
    ر
    1.50
    2
    1.48
    1
    1.46
    3
    1.46
    8
    1.46
    POSITIVE LOGITS
    🠀
    1.31
    s
    1.12
    zeitig
    1.10
     ardu
    1.10
     ادار
    1.09
     Deshalb
    1.09
    我国
    1.07
     Artinya
    1.06
    用於
    1.06
    その
    1.05
    Act Density 0.489%

    No Known Activations