INDEX
    Explanations

    left, right, 2, red, black

    New Auto-Interp
    Negative Logits
    0.58
    0.54
    ד
    0.54
    ัล
    0.54
    0.50
    ה
    0.49
    ל
    0.48
    0.47
    ężczy
    0.47
    קי
    0.47
    POSITIVE LOGITS
     also
    0.52
     provides
    0.48
     from
    0.48
     only
    0.47
     with
    0.46
     inoltre
    0.46
     serves
    0.46
     increases
    0.45
     home
    0.45
     Class
    0.45
    Act Density 0.016%

    No Known Activations