INDEX
    Explanations

    negative words related to criticism or disapproval

    negative descriptors or phrases related to unfavorable qualities

    New Auto-Interp
    Negative Logits
    ĸļ
    -0.96
    raltar
    -0.77
    ensional
    -0.77
    earchers
    -0.75
    ittees
    -0.75
    theless
    -0.75
    htaking
    -0.74
    conservancy
    -0.73
    eston
    -0.73
    xual
    -0.72
    POSITIVE LOGITS
    dies
    1.09
    dest
    1.08
    die
    1.04
    ger
    0.96
    gered
    0.93
    GES
    0.88
    ged
    0.86
    ges
    0.86
     karma
    0.86
     luck
    0.85
    Act Density 0.029%

    No Known Activations