INDEX
    Explanations

    exclamations and expressions of excitement in the text

    New Auto-Interp
    Negative Logits
    gdx
    -0.82
    ede
    -0.82
    aure
    -0.76
    ation
    -0.75
     Rés
    -0.75
     Rump
    -0.74
    "):
    
    -0.73
    [`
    -0.69
     Carter
    -0.69
    böz
    -0.67
    POSITIVE LOGITS
    %!
    1.78
    ?!?
    1.73
    ?!?!
    1.64
     !
    1.55
    !
    1.55
    !!!!!!
    1.53
    !!!!!!!
    1.52
    !!!!!!!!!!
    1.44
    ?!
    1.43
    !"
    1.42
    Act Density 0.091%

    No Known Activations