INDEX
Explanations
references to national identity or patriotism
mentions of the word "country."
New Auto-Interp
Negative Logits
Visual
-0.75
TAG
-0.73
guiActiveUn
-0.70
NB
-0.70
urations
-0.70
INTER
-0.69
amination
-0.67
sbm
-0.67
wcsstore
-0.66
Vor
-0.66
POSITIVE LOGITS
wide
1.16
ICAN
0.98
icans
0.89
ican
0.85
men
0.85
oslov
0.80
lance
0.76
Fathers
0.75
keye
0.75
icz
0.73
Activations Density 0.062%