INDEX
Explanations
references to citizenship and national identity
New Auto-Interp
Negative Logits
mui
-0.54
cfm
-0.54
שוליים
-0.49
Ald
-0.48
flops
-0.48
neux
-0.48
istration
-0.47
pses
-0.46
hib
-0.46
RM
-0.46
POSITIVE LOGITS
patriotic
1.20
patriotism
1.14
patriot
1.12
nationality
0.99
nationalism
0.98
patriots
0.97
nation
0.93
flags
0.91
patrio
0.91
riotic
0.89
Activations Density 0.323%