INDEX
Explanations
rhetorical questions and conversational language
New Auto-Interp
Negative Logits
uce
-0.15
agen
-0.14
زÛĮ
-0.14
culus
-0.14
ãĥĨãĥ«
-0.14
merit
-0.14
obook
-0.14
imm
-0.14
erg
-0.14
loy
-0.13
POSITIVE LOGITS
yeah
0.18
WELL
0.17
Well
0.15
Yeah
0.15
/tos
0.15
chances
0.15
bien
0.14
ãĥ©ãĤ¹
0.14
shima
0.14
well
0.14
Activations Density 0.083%