INDEX
    Explanations

    phrases expressing approval or positive sentiment

    positive expressions of achievement and well-being

    New Auto-Interp
    Negative Logits
    urst
    -0.77
    umed
    -0.75
    ¶ħ
    -0.74
    ume
    -0.74
    uming
    -0.73
    uria
    -0.72
    olina
    -0.71
    ipers
    -0.71
    agram
    -0.71
    cessive
    -0.69
    POSITIVE LOGITS
     comrade
    0.79
     tid
    0.72
    reen
    0.72
    clus
    0.70
     folks
    0.70
     inconvenience
    0.69
     somebody
    0.65
     avoids
    0.65
    noon
    0.64
     sunshine
    0.64
    Act Density 0.293%

    No Known Activations