INDEX
    Explanations

    master, hands, secretly insecure

    New Auto-Interp
    Negative Logits
     ב
    0.57
     ל
    0.56
     כ
    0.56
     ח
    0.53
     excret
    0.51
     מ
    0.50
     ס
    0.50
    0.50
    0.49
     לא
    0.49
    POSITIVE LOGITS
    ът
    0.56
    ят
    0.52
    وست
    0.51
    و
    0.50
    وون
    0.47
    ötz
    0.47
    0.47
    其他
    0.46
    ώσεις
    0.46
    питы
    0.44
    Act Density 0.000%

    No Known Activations