INDEX
    Explanations

    words related to exclamations or emphatic expressions

    expressions of excitement or emphasis

    New Auto-Interp
    Negative Logits
    apt
    -0.71
     maj
    -0.67
     pessim
    -0.65
    raph
    -0.65
     76
    -0.63
    nec
    -0.62
     Aur
    -0.61
    aer
    -0.61
     curs
    -0.60
     mull
    -0.60
    POSITIVE LOGITS
    !'
    1.24
    !.
    1.16
    !,
    1.13
    !'"
    1.09
    !:
    1.08
    !
    1.05
    !/
    1.01
    !]
    0.98
    !?
    0.98
    !".
    0.95
    Act Density 0.211%

    No Known Activations