INDEX
Explanations
proper nouns related to national identities or landmarks
New Auto-Interp
Negative Logits
ullivan
-0.77
enger
-0.74
areth
-0.71
Scand
-0.69
eva
-0.69
redd
-0.66
hou
-0.66
rox
-0.65
bane
-0.65
ered
-0.65
POSITIVE LOGITS
Geographic
1.08
ity
1.01
ities
0.91
unity
0.85
ITY
0.83
Monument
0.80
Rifle
0.80
istic
0.79
ism
0.79
icol
0.78
Activations Density 0.021%