INDEX
    Explanations

    phrases emphasizing repetition or consistency

    New Auto-Interp
    Negative Logits
    //
    -0.64
    Bauer
    -0.61
    zt
    -0.60
    йом
    -0.57
    Cone
    -0.56
    ه‌اند
    -0.56
    -0.55
    时候
    -0.55
     Magdalene
    -0.55
    ässä
    -0.54
    POSITIVE LOGITS
    every
    1.69
     every
    1.64
     EVERY
    1.64
    EVERY
    1.63
     Every
    1.55
    Every
    1.52
     Ogni
    1.24
     Jedes
    1.15
     Jede
    1.10
     Elke
    1.10
    Act Density 0.106%

    No Known Activations