INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    ian
    1.08
    elin
    0.97
    .
    0.96
    ard
    0.91
    isch
    0.91
    -
    0.90
    ate
    0.88
    ier
    0.88
    eng
    0.86
    ene
    0.83
    POSITIVE LOGITS
    то
    1.63
    ל
    1.49
    ו
    1.41
    ل
    1.41
    to
    1.39
    و
    1.38
    1.30
    на
    1.29
    1.29
    in
    1.26
    Act Density 0.000%

    No Known Activations