INDEX
Explanations
mentions of interactions with strangers
references to strangers in various contexts
New Auto-Interp
Negative Logits
rity
-0.86
iox
-0.84
rates
-0.79
prus
-0.75
vez
-0.73
orie
-0.70
iano
-0.70
inion
-0.70
ramid
-0.69
ris
-0.68
POSITIVE LOGITS
strangers
0.80
grop
0.79
stranger
0.78
bitten
0.77
flung
0.77
liness
0.75
whom
0.75
worldly
0.73
ishly
0.72
acquainted
0.70
Activations Density 0.023%