INDEX
    Explanations

    phrases indicating surprise or disbelief

    expressions of disbelief or the need for assistance

    New Auto-Interp
    Negative Logits
    rongh
    -0.74
    rane
    -0.69
    owa
    -0.67
    heading
    -0.62
    ighth
    -0.62
    azel
    -0.60
    isk
    -0.58
    bush
    -0.57
    eport
    -0.57
    chwitz
    -0.57
    POSITIVE LOGITS
     anymore
    0.65
     âĶľ
    0.65
     Louie
    0.65
     Tex
    0.62
     Surprise
    0.60
    aughs
    0.60
    uitous
    0.60
     Presents
    0.59
     Vaugh
    0.59
    sth
    0.58
    Act Density 0.083%

    No Known Activations