INDEX
Explanations
mentions of specific cultural and national affiliations
New Auto-Interp
Negative Logits
isons
-0.15
pts
-0.14
asia
-0.14
itu
-0.14
016
-0.14
Gram
-0.13
losures
-0.13
iku
-0.13
ulty
-0.13
irts
-0.13
POSITIVE LOGITS
wegian
0.35
apanese
0.34
namese
0.34
inese
0.33
uvian
0.33
sonian
0.32
anian
0.30
onian
0.29
ussian
0.28
orean
0.28
Activations Density 0.202%