INDEX
Explanations
countries or nationalities
nationalities or citizenships of individuals
New Auto-Interp
Negative Logits
merce
-0.84
izons
-0.84
steps
-0.82
days
-0.81
uers
-0.77
itals
-0.77
pots
-0.77
iffs
-0.76
adiq
-0.76
nyder
-0.76
POSITIVE LOGITS
slang
0.95
artist
0.93
philosopher
0.93
indie
0.92
comedian
0.91
poet
0.91
novelist
0.88
filmmaker
0.87
Renaissance
0.86
folklore
0.86
Activations Density 0.141%