INDEX
    Explanations

    expressions of surprise or disbelief

    emotional reactions or expressions of surprise and realization

    New Auto-Interp
    Negative Logits
    unal
    -0.83
    ullivan
    -0.82
    ciplinary
    -0.74
     occasion
    -0.66
     Flavoring
    -0.66
    aband
    -0.65
     Cosponsors
    -0.63
    cephal
    -0.60
    minist
    -0.60
    ioned
    -0.60
    POSITIVE LOGITS
    ?".
    0.96
    '"
    0.91
    Hey
    0.90
    '."
    0.85
    ?'"
    0.83
    .'"
    0.83
     hey
    0.83
    hey
    0.81
    .")
    0.80
    â̦."
    0.79
    Act Density 0.160%

    No Known Activations