INDEX
    Explanations

    words related to positivity or benefit

    terms related to positive outcomes or advantages

    New Auto-Interp
    Negative Logits
     Wolves
    -0.71
    Hun
    -0.68
    Rush
    -0.68
    buck
    -0.67
    pper
    -0.66
    Bur
    -0.65
    á
    -0.65
     Fever
    -0.64
     Barcl
    -0.63
    hani
    -0.63
    POSITIVE LOGITS
     beneficial
    0.90
    icial
    0.88
    iciary
    0.85
    rative
    0.78
     destro
    0.78
     synerg
    0.75
     agre
    0.74
    tarian
    0.74
    chwitz
    0.73
    ritional
    0.72
    Act Density 0.010%

    No Known Activations