INDEX
Explanations
terms related to socio-political discussions around altruism, group behavior, and societal dynamics
New Auto-Interp
Negative Logits
WATCHED
-0.77
Merit
-0.74
PRESS
-0.74
unts
-0.73
Charg
-0.72
LY
-0.71
DIT
-0.67
Dialogue
-0.65
Dur
-0.65
zilla
-0.65
POSITIVE LOGITS
sprang
0.82
derive
0.79
eman
0.79
derives
0.76
flows
0.75
flowed
0.73
arises
0.72
sprung
0.71
spawned
0.70
springs
0.69
Activations Density 0.027%