INDEX
    Explanations

    adjectives or phrases expressing surprise or disappointment

    expressions of regret or concern about societal issues

    New Auto-Interp
    Negative Logits
     exting
    -0.75
    qus
    -0.74
    iHUD
    -0.71
    jong
    -0.70
    semble
    -0.70
     pione
    -0.69
    ivalent
    -0.69
    pleting
    -0.69
    ignty
    -0.68
    edom
    -0.68
    POSITIVE LOGITS
     they
    0.89
     nobody
    0.86
     we
    0.82
     why
    0.81
     THEY
    0.77
     that
    0.77
     everyone
    0.73
    adays
    0.70
     people
    0.70
     he
    0.69
    Act Density 0.144%

    No Known Activations