INDEX
    Explanations

    expressions of surprise or exclamation

    New Auto-Interp
    Negative Logits
    pis
    -0.15
    615
    -0.14
    amba
    -0.14
     elephant
    -0.14
    seau
    -0.14
    ofilm
    -0.14
    ewise
    -0.13
    hle
    -0.13
    ackage
    -0.13
    ót
    -0.13
    POSITIVE LOGITS
    unar
    0.18
    iggins
    0.16
    gree
    0.15
    kov
    0.15
     Tempo
    0.15
    ови
    0.15
    竣
    0.14
    ignon
    0.14
    ован
    0.14
    254
    0.13
    Act Density 0.075%

    No Known Activations