INDEX
    Explanations

    phrases indicating a sense of negativity or discontent

    New Auto-Interp
    Negative Logits
    ullo
    -0.19
    angen
    -0.17
    ninger
    -0.16
    uide
    -0.15
    nev
    -0.15
    äge
    -0.15
    OUS
    -0.15
    aris
    -0.15
     richer
    -0.14
    åı¯æĺ¯
    -0.14
    POSITIVE LOGITS
     much
    0.20
     diss
    0.19
    ething
    0.18
     great
    0.17
    much
    0.16
     ragaz
    0.16
     Much
    0.16
     anymore
    0.15
    Much
    0.15
    ley
    0.15
    Act Density 0.028%

    No Known Activations