INDEX
Explanations
the word "stranger" in various contexts
New Auto-Interp
Negative Logits
rity
-0.86
iox
-0.83
vez
-0.78
rates
-0.73
prus
-0.73
reon
-0.72
odium
-0.72
urer
-0.70
pleting
-0.70
orie
-0.69
POSITIVE LOGITS
liness
0.95
flung
0.80
ishly
0.79
worldly
0.78
hood
0.75
whom
0.73
grop
0.71
lihood
0.69
bitten
0.69
stranger
0.69
Activations Density 0.041%