INDEX
Explanations
references to interactions with strangers, especially in a helpful or considerate context
occurrences of the word "stranger" in various contexts
New Auto-Interp
Negative Logits
rity
-0.85
rates
-0.82
erb
-0.73
iox
-0.72
REE
-0.71
rix
-0.71
inion
-0.70
vez
-0.70
odium
-0.68
prus
-0.68
POSITIVE LOGITS
liness
0.91
ishly
0.79
hood
0.78
grop
0.78
flung
0.76
strangers
0.75
bitten
0.74
stranger
0.73
worldly
0.71
whom
0.68
Activations Density 0.022%