INDEX
Explanations
references to home, homeland, and hometown connected to individuals
New Auto-Interp
Negative Logits
Depths
-0.16
allest
-0.15
ofi
-0.15
uraa
-0.15
jej
-0.14
airo
-0.14
minster
-0.14
@student
-0.14
wand
-0.14
ettel
-0.14
POSITIVE LOGITS
home
0.54
native
0.53
hometown
0.45
native
0.41
homeland
0.40
adopted
0.38
birth
0.38
-native
0.36
home
0.35
adopt
0.35
Activations Density 0.129%