INDEX
    Explanations

    instances of surprise or unexpected events

    New Auto-Interp
    Negative Logits
    .scalablytyped
    -0.19
    ajs
    -0.17
    une
    -0.16
    -ci
    -0.15
    isters
    -0.14
    stre
    -0.14
    ICLES
    -0.14
    serter
    -0.14
    ishops
    -0.14
    gie
    -0.13
    POSITIVE LOGITS
    ingly
    0.32
    ably
    0.23
     surprise
    0.20
    ously
    0.18
     Surprise
    0.18
     surpr
    0.18
    IPA
    0.16
     surprised
    0.16
    ylon
    0.16
    /errors
    0.15
    Act Density 0.051%

    No Known Activations