INDEX
Explanations
concepts related to political theory
New Auto-Interp
Negative Logits
eid
-0.18
ldb
-0.16
NCY
-0.15
_ENGINE
-0.15
AMPL
-0.15
ruba
-0.15
omentum
-0.14
/Dk
-0.14
rub
-0.14
omon
-0.14
POSITIVE LOGITS
IR
0.33
IR
0.26
interstate
0.25
Morg
0.25
Walt
0.23
(IR
0.23
realism
0.22
actors
0.22
state
0.22
security
0.22
Activations Density 0.016%