INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.84
    ה
    0.84
    ב
    0.79
    ची
    0.74
    נ
    0.73
    ת
    0.72
    0.71
    0.66
    0.66
    0.65
    POSITIVE LOGITS
     lips
    1.13
    0.93
     Lips
    0.88
    Lips
    0.85
    kaart
    0.79
    lips
    0.77
    👄
    0.77
    0.75
     labios
    0.72
     a
    0.71
    Act Density 0.005%

    No Known Activations