INDEX
Explanations
references to geographical locations, with a strong preference for Europe
references to Europe
New Auto-Interp
Negative Logits
oru
-0.78
ymm
-0.77
anan
-0.77
yright
-0.75
atchewan
-0.72
LESS
-0.69
aron
-0.68
yrights
-0.67
ledged
-0.66
anamo
-0.66
POSITIVE LOGITS
Union
0.90
Parliament
0.89
countries
0.82
continent
0.78
nations
0.78
capitals
0.75
Continent
0.74
Galile
0.73
Countries
0.72
ffen
0.69
Activations Density 0.042%