INDEX
    Explanations

    phrases indicating a lack of awareness or being disconnected from reality

    New Auto-Interp
    Negative Logits
    jid
    -0.17
    embro
    -0.17
    anki
    -0.17
    amber
    -0.16
    ems
    -0.15
    áž
    -0.15
    ávÄĽ
    -0.15
    _gem
    -0.14
     маÑģÑĤ
    -0.14
    adge
    -0.14
    POSITIVE LOGITS
    enna
    0.16
    itta
    0.15
    ights
    0.15
    eness
    0.14
     err
    0.14
     Hath
    0.14
    ató
    0.14
    fol
    0.14
    iyat
    0.14
    467
    0.14
    Act Density 0.105%

    No Known Activations