INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    л
    1.18
    1.06
    1.02
    1.01
    ש
    1.00
    '
    0.99
    0.98
    ני
    0.93
    ב
    0.93
    ע
    0.92
    POSITIVE LOGITS
    a
    0.93
    orems
    0.82
    afe
    0.74
    holm
    0.73
    rack
    0.72
    eem
    0.71
    fords
    0.70
    he
    0.70
    ed
    0.70
    packs
    0.69
    Act Density 0.000%

    No Known Activations