INDEX
Explanations
the name "Sus" or "Suz" alongside some activations associated with different aspects
references to the name "Susan" and variations thereof
New Auto-Interp
Negative Logits
è¦ļéĨĴ
-0.84
overhead
-0.76
learning
-0.74
OPLE
-0.74
living
-0.73
sorting
-0.69
machinery
-0.69
hetti
-0.68
anwhile
-0.68
erous
-0.68
POSITIVE LOGITS
annah
1.22
pect
1.12
pected
1.03
Sus
1.00
pects
0.99
pic
0.95
pecting
0.94
pir
0.90
itiz
0.90
Sus
0.84
Activations Density 0.007%