INDEX
Explanations
dialogues including quotation marks
direct speech or quotes in the text
New Auto-Interp
Negative Logits
characterized
-0.74
conclud
-0.73
upstream
-0.72
downstream
-0.72
valued
-0.71
frontline
-0.70
footing
-0.70
ticket
-0.69
¥ŀ
-0.69
pole
-0.68
POSITIVE LOGITS
Oh
1.18
Yeah
1.16
wcsstore
1.11
Yes
1.07
Fuck
1.06
Hmm
1.04
YES
1.04
Sorry
1.02
Huh
1.02
Damn
1.02
Activations Density 0.094%