INDEX
    Explanations

    adjectives or phrases emphasizing fairness or honesty

    phrases emphasizing clarity, honesty, and fairness

    New Auto-Interp
    Negative Logits
     merit
    -0.70
    berra
    -0.66
    Bur
    -0.65
    ajo
    -0.64
    Ger
    -0.63
    Han
    -0.61
    aleb
    -0.61
    Bel
    -0.60
    Comb
    -0.60
     overflow
    -0.59
    POSITIVE LOGITS
    externalActionCode
    0.80
     WATCHED
    0.73
     Yourself
    0.71
    quished
    0.71
    ioned
    0.68
    uzz
    0.66
     Pryor
    0.65
    .--
    0.64
    onomic
    0.63
     someday
    0.63
    Act Density 0.197%

    No Known Activations