INDEX
    Explanations

    terms related to altruism and cooperative behaviors

    New Auto-Interp
    Negative Logits
     forget
    -1.93
    wise
    -1.77
     minds
    -1.70
    assadors
    -1.66
    '?"
    -1.64
    obbsee
    -1.63
    ters
    -1.48
     notice
    -1.46
    orers
    -1.46
    )){
    -1.45
    POSITIVE LOGITS
    enstein
    1.99
    xton
    1.98
     cellar
    1.47
    ÅĽci
    1.47
     billing
    1.42
    ford
    1.42
    ppo
    1.41
    ende
    1.40
    pee
    1.39
    âĢŁ
    1.38
    Act Density 0.018%

    No Known Activations