INDEX
    Explanations

    phrases indicating surprise or disbelief

    phrases that express a lack of awareness or understanding

    New Auto-Interp
    Negative Logits
    rend
    -0.80
    ugal
    -0.72
    cipl
    -0.70
    unity
    -0.66
    eous
    -0.66
     srfAttach
    -0.65
    pora
    -0.64
    only
    -0.64
    ror
    -0.62
    illed
    -0.61
    POSITIVE LOGITS
     remotely
    1.33
     bothering
    0.99
     bothered
    0.94
     vaguely
    0.93
     halfway
    0.90
     bother
    0.90
     hint
    0.85
     faintly
    0.84
     close
    0.84
     scratch
    0.83
    Act Density 0.067%

    No Known Activations