INDEX
    Explanations

    expressions of surprise or exclamation

    New Auto-Interp
    Negative Logits
     myſelf
    -1.09
     himſelf
    -0.92
     themſelves
    -0.88
     houſe
    -0.88
     itſelf
    -0.86
     raiſ
    -0.83
    himself
    -0.79
     Houſe
    -0.76
     Efq
    -0.76
     perſon
    -0.74
    POSITIVE LOGITS
     Oh
    1.29
    Oh
    1.18
     oh
    1.10
    oh
    0.99
    Ohh
    0.98
     OH
    0.98
    Ohhh
    0.94
    Oooh
    0.90
    Ohhhh
    0.89
    toh
    0.85
    Act Density 0.044%

    No Known Activations