INDEX
Explanations
references to strangers and their interactions in various contexts
New Auto-Interp
Negative Logits
rates
-0.87
umen
-0.78
alist
-0.77
iox
-0.75
arist
-0.74
prus
-0.73
rity
-0.72
rix
-0.71
vez
-0.70
rate
-0.70
POSITIVE LOGITS
liness
0.86
whom
0.86
flung
0.84
grop
0.82
hood
0.82
who
0.82
alike
0.78
strangers
0.76
stranger
0.74
ishly
0.73
Activations Density 0.008%