INDEX
Explanations
terms related to ideology
New Auto-Interp
Negative Logits
ond
-0.18
anson
-0.17
rou
-0.16
abouts
-0.16
MOVED
-0.16
Lİ
-0.15
OND
-0.15
iger
-0.15
okud
-0.15
inand
-0.15
POSITIVE LOGITS
pend
0.23
als
0.20
ology
0.19
ologies
0.18
yll
0.18
ologically
0.18
ally
0.16
PEND
0.16
ological
0.16
ALLY
0.16
Activations Density 0.009%