INDEX
Explanations
references to dialogue and dialog-related structures in text
New Auto-Interp
Negative Logits
MENT
-0.16
ment
-0.16
slee
-0.15
bound
-0.15
ugg
-0.15
aby
-0.15
idge
-0.14
ãģŀ
-0.14
igans
-0.14
cher
-0.14
POSITIVE LOGITS
ues
0.30
/dialog
0.20
UES
0.19
uese
0.18
atical
0.16
(Dialog
0.16
gable
0.15
ãģĤãģ£ãģŁ
0.15
UE
0.15
缸æīĭ
0.15
Activations Density 0.013%