INDEX
Explanations
mentions of geographical locations and adjectives related to nationality
terms related to nationality or geographical origin
New Auto-Interp
Negative Logits
idden
-0.72
atown
-0.71
fo
-0.69
okin
-0.69
ushima
-0.69
usters
-0.63
imag
-0.63
odes
-0.62
omed
-0.62
calibration
-0.61
POSITIVE LOGITS
Liam
0.82
Joey
0.80
Shaun
0.77
Jeremiah
0.77
Damon
0.75
Colin
0.74
Ezra
0.74
winger
0.74
Reggie
0.73
Ezekiel
0.73
Activations Density 0.109%