INDEX
Explanations
references to "ide" or "ideological" terms
terms related to ideologies
New Auto-Interp
Negative Logits
enegger
-0.98
iona
-0.92
ichick
-0.87
ufact
-0.76
é¾
-0.71
thening
-0.71
wagen
-0.71
cffff
-0.69
itton
-0.69
lishing
-0.68
POSITIVE LOGITS
gger
0.97
maid
0.90
ously
0.88
lli
0.88
ll
0.86
lla
0.79
llo
0.78
creen
0.76
hare
0.73
Dhabi
0.72
Activations Density 0.031%