INDEX
Explanations
concepts related to societal organization and control
New Auto-Interp
Negative Logits
759
-0.16
umba
-0.16
aft
-0.15
amarin
-0.15
ros
-0.15
awner
-0.15
orget
-0.15
ERO
-0.15
resco
-0.14
ervers
-0.14
POSITIVE LOGITS
Tactics
0.25
tactics
0.24
methods
0.22
Methods
0.21
Means
0.20
sacrifices
0.19
sacrifice
0.19
sometimes
0.19
Means
0.19
Methods
0.18
Activations Density 0.208%