INDEX
Explanations
words related to politics, control, and authority
concepts related to political and social dynamics
New Auto-Interp
Negative Logits
ymm
-0.56
arnaev
-0.55
xon
-0.53
atom
-0.52
ULTS
-0.52
onz
-0.52
qua
-0.51
ãĥ£
-0.51
cci
-0.51
ortium
-0.51
POSITIVE LOGITS
itself
0.67
herself
0.60
himself
0.55
yourself
0.54
peripher
0.54
POV
0.54
altogether
0.53
motif
0.52
pedia
0.52
entails
0.51
Activations Density 1.094%