INDEX
    Explanations

    exclamations or expressions of emotion

    expressions of congratulations and reassurance

    New Auto-Interp
    Negative Logits
    artney
    -0.82
    cious
    -0.72
    itialized
    -0.68
    ritic
    -0.66
    pend
    -0.66
    inosaur
    -0.62
    vey
    -0.61
    eatured
    -0.61
    dinand
    -0.59
    fund
    -0.59
    POSITIVE LOGITS
    !
    1.07
    !,
    1.05
    !:
    1.00
    !.
    0.98
    !]
    0.91
    !),
    0.87
    !).
    0.87
    !!
    0.87
    !)
    0.85
    !'
    0.84
    Act Density 0.149%

    No Known Activations