INDEX
Explanations
phrases related to ideology and political beliefs
terms related to various forms of national identity or nationalism
New Auto-Interp
Negative Logits
conom
-0.78
ãģ®éŃĶ
-0.76
astern
-0.74
---------------
-0.74
ortium
-0.72
Boss
-0.72
Brand
-0.69
lings
-0.68
flix
-0.66
Snake
-0.65
POSITIVE LOGITS
ized
0.97
ally
0.92
ities
0.92
ism
0.88
ity
0.87
ational
0.86
ists
0.84
ist
0.84
izational
0.83
ité
0.81
Activations Density 0.015%