INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     painful
    0.51
     at
    0.50
     $(
    0.50
     fearful
    0.49
     for
    0.48
     Benutz
    0.46
     von
    0.46
     $
    0.46
     Werte
    0.45
     Anwendung
    0.45
    POSITIVE LOGITS
    ون
    0.64
    0.47
    들이
    0.46
    ي
    0.45
    א
    0.45
    و
    0.44
    0.44
    عی
    0.43
    ли
    0.43
    ке
    0.43
    Act Density 0.027%

    No Known Activations