INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Cox
    -0.07
    NH
    -0.07
     Weiss
    -0.07
    -0.07
     ה
    -0.06
     INF
    -0.06
     cooling
    -0.06
    xffffff
    -0.06
     Kauf
    -0.06
     Fallon
    -0.06
    POSITIVE LOGITS
    á
    0.13
    é
    0.12
    ó
    0.11
    Á
    0.10
    Ú
    0.09
    í
    0.09
    0.09
    ُ
    0.09
    ú
    0.09
    Ó
    0.09
    Act Density 0.067%

    No Known Activations