INDEX
    Explanations

    exclamations suggesting surprise or disbelief

    expressions of surprise or strong emotion

    New Auto-Interp
    Negative Logits
     certain
    -0.70
     generally
    -0.64
     partly
    -0.63
     general
    -0.61
     broadly
    -0.61
     similar
    -0.60
     grav
    -0.60
     various
    -0.60
     predominantly
    -0.58
     principally
    -0.58
    POSITIVE LOGITS
    ?!
    2.56
    !!
    2.45
    !!!
    2.38
    !!!!
    2.27
    !!!!!
    2.22
    !",
    2.16
    !".
    2.13
    !
    2.12
    !),
    2.08
    !).
    2.05
    Act Density 0.032%

    No Known Activations