INDEX
    Explanations

    words and phrases related to surprise or unexpectedness

    New Auto-Interp
    Negative Logits
    mav
    -0.15
    .scalablytyped
    -0.15
    मर
    -0.15
    istrovstvÃŃ
    -0.15
    une
    -0.15
    casts
    -0.14
    _ary
    -0.14
    ajs
    -0.14
    nd
    -0.14
    каÑģ
    -0.13
    POSITIVE LOGITS
    ingly
    0.35
     surprise
    0.25
     surpr
    0.22
     Surprise
    0.21
     surprised
    0.20
     surprises
    0.19
    ively
    0.19
    ably
    0.18
     unexpected
    0.18
    oeff
    0.17
    Act Density 0.040%

    No Known Activations