INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ،
    0.52
    0.47
    nas
    0.46
    nian
    0.46
    ra
    0.46
    '
    0.45
    li
    0.44
    \
    0.43
    ma
    0.43
    ot
    0.43
    POSITIVE LOGITS
    G
    0.57
    P
    0.53
    Y
    0.52
    이지만
    0.51
     or
    0.49
     fakat
    0.49
    یک
    0.47
    C
    0.47
    ى
    0.47
    л
    0.46
    Act Density 1.504%

    No Known Activations