INDEX
Explanations
affirmations or confirmations in conversations
New Auto-Interp
Negative Logits
oles
-0.17
dorf
-0.16
Walters
-0.16
undi
-0.15
anya
-0.14
oad
-0.14
ittel
-0.14
dig
-0.14
vang
-0.14
539
-0.14
POSITIVE LOGITS
inde
0.17
agal
0.17
indeed
0.16
YES
0.16
iesel
0.16
Indeed
0.15
mpar
0.14
icari
0.14
Indeed
0.14
yes
0.14
Activations Density 0.111%