INDEX
    Explanations

    terms related to universality or collective experiences

    New Auto-Interp
    Negative Logits
    //
    -0.60
    Bauer
    -0.57
    ه‌اند
    -0.56
    ässä
    -0.55
    /
    -0.55
    zt
    -0.54
    йом
    -0.53
    ktır
    -0.53
    DaoImpl
    -0.52
    Cone
    -0.52
    POSITIVE LOGITS
    every
    1.64
     every
    1.61
     EVERY
    1.60
    EVERY
    1.59
     Every
    1.53
    Every
    1.46
     Ogni
    1.21
     Everywhere
    1.11
     Jede
    1.07
     Jedes
    1.07
    Act Density 0.049%

    No Known Activations