INDEX
Explanations
references to geographic locations or hometowns
New Auto-Interp
Negative Logits
allest
-0.17
ofi
-0.15
uraa
-0.15
Depths
-0.15
linik
-0.14
otten
-0.14
wand
-0.14
paged
-0.14
ettel
-0.14
airo
-0.14
POSITIVE LOGITS
native
0.49
home
0.45
hometown
0.43
native
0.39
homeland
0.38
adopted
0.38
birth
0.37
adopt
0.34
-native
0.34
stom
0.33
Activations Density 0.142%