INDEX
    Explanations

    expressions related to concern or indifference towards societal issues

    New Auto-Interp
    Negative Logits
    ãĥ³ãĤ¸
    -0.82
    UES
    -0.77
    ======
    -0.69
    kered
    -0.68
    onite
    -0.64
    oute
    -0.64
    ãĥĥãĥī
    -0.62
    CHA
    -0.62
    KEN
    -0.60
    ãĥĪ
    -0.58
    POSITIVE LOGITS
    taker
    1.34
    lessly
    1.25
    lessness
    1.15
    giving
    0.98
    ening
    0.97
    fully
    0.93
    taking
    0.89
    der
    0.85
    ful
    0.85
    eners
    0.80
    Act Density 0.603%

    No Known Activations