INDEX
    Explanations

    terms related to socio-political discussions around altruism, group behavior, and societal dynamics

    New Auto-Interp
    Negative Logits
     WATCHED
    -0.77
    Merit
    -0.74
    PRESS
    -0.74
    unts
    -0.73
    Charg
    -0.72
    LY
    -0.71
    DIT
    -0.67
    Dialogue
    -0.65
    Dur
    -0.65
    zilla
    -0.65
    POSITIVE LOGITS
     sprang
    0.82
     derive
    0.79
     eman
    0.79
     derives
    0.76
     flows
    0.75
     flowed
    0.73
     arises
    0.72
     sprung
    0.71
     spawned
    0.70
     springs
    0.69
    Act Density 0.027%

    No Known Activations