INDEX
    Explanations

    expressions of negativity and hopelessness

    New Auto-Interp
    Negative Logits
    (strpos
    -0.17
    oog
    -0.17
     desires
    -0.15
    Äħż
    -0.15
    otton
    -0.15
    Це
    -0.14
    çĹ
    -0.14
    odable
    -0.14
    eru
    -0.14
     desire
    -0.14
    POSITIVE LOGITS
     negative
    0.35
     negativity
    0.34
     Negative
    0.34
     pessim
    0.33
    Negative
    0.31
    essim
    0.31
    negative
    0.31
     glo
    0.30
    -negative
    0.30
    _negative
    0.29
    Act Density 0.254%

    No Known Activations