INDEX
    Explanations

    instances of surprise or astonishment in the text

    New Auto-Interp
    Negative Logits
     Vig
    -0.17
    ationship
    -0.17
    olec
    -0.15
    å¿į
    -0.14
     Dra
    -0.14
    KT
    -0.14
    ctor
    -0.13
    odial
    -0.13
    ifice
    -0.13
    holes
    -0.13
    POSITIVE LOGITS
    _COMPAT
    0.16
    asma
    0.16
    lys
    0.15
    143
    0.14
     ActionTypes
    0.14
    hq
    0.14
    Miller
    0.14
    머
    0.14
    unk
    0.13
     Schmidt
    0.13
    Act Density 0.007%

    No Known Activations