INDEX
    Explanations

    expressions of surprise or realization

    New Auto-Interp
    Negative Logits
     surla
    -0.47
    iële
    -0.45
    Amit
    -0.39
     Amic
    -0.39
     Abhishek
    -0.39
     Landmark
    -0.38
     estekak
    -0.37
    oise
    -0.37
    erdere
    -0.37
    kében
    -0.36
    POSITIVE LOGITS
    Oh
    1.26
     Oh
    1.20
     oh
    1.04
    oh
    1.02
    Ohhh
    0.82
    Oooh
    0.81
    Ohh
    0.80
    Ohhhh
    0.79
    Ooh
    0.75
     OH
    0.74
    Act Density 0.007%

    No Known Activations