INDEX
    Explanations

    phrases indicating ongoing actions or events

    New Auto-Interp
    Negative Logits
    ernet
    -0.20
    azor
    -0.16
    idi
    -0.15
    pone
    -0.15
    606
    -0.15
    udio
    -0.14
    udi
    -0.14
    raz
    -0.14
    inte
    -0.14
     æķ
    -0.13
    POSITIVE LOGITS
     happening
    0.24
     wrong
    0.24
     happen
    0.23
     bump
    0.20
    wrong
    0.19
     happened
    0.18
     happens
    0.17
     Wrong
    0.17
     uns
    0.17
     going
    0.17
    Act Density 0.014%

    No Known Activations