INDEX
Explanations
phrases related to ideologies and their analysis
New Auto-Interp
Negative Logits
ITUDE
-0.16
itude
-0.15
nic
-0.15
udi
-0.15
rooms
-0.15
rou
-0.14
nout
-0.14
ingu
-0.14
Duffy
-0.14
ë³µ
-0.14
POSITIVE LOGITS
pend
0.24
ologically
0.23
ally
0.22
ogram
0.22
als
0.21
ation
0.21
ologue
0.21
ational
0.20
ALLY
0.20
ograms
0.20
Activations Density 0.005%