INDEX
    Explanations

    expressions of surprise or emphasis

    New Auto-Interp
    Negative Logits
    eniable
    -0.17
    ilon
    -0.15
     Apprec
    -0.15
    ÄĽÅ¾
    -0.15
    idious
    -0.15
    enty
    -0.15
    ounder
    -0.14
    oor
    -0.14
    ãn
    -0.14
    jem
    -0.14
    POSITIVE LOGITS
     else
    0.20
     more
    0.20
     a
    0.20
     do
    0.19
     could
    0.18
     better
    0.18
     timing
    0.17
     an
    0.17
    sis
    0.17
     did
    0.16
    Act Density 0.043%

    No Known Activations