INDEX
Explanations
phrases related to nationalism or national pride
New Auto-Interp
Negative Logits
wei
-0.76
Tanz
-0.73
isu
-0.72
Morse
-0.71
Sawyer
-0.70
chnology
-0.69
pta
-0.69
avez
-0.66
perature
-0.65
umb
-0.64
POSITIVE LOGITS
ously
1.16
edly
1.01
eous
1.01
doms
0.94
ous
0.93
fully
0.92
mares
0.88
antly
0.85
ly
0.81
acies
0.79
Activations Density 0.453%