INDEX
Explanations
affirmative responses or confirmations in conversation
New Auto-Interp
Negative Logits
istr
-0.15
inand
-0.15
andbox
-0.15
oron
-0.15
.Align
-0.14
éļİ
-0.14
allon
-0.14
élé
-0.14
767
-0.14
dden
-0.13
POSITIVE LOGITS
ascus
0.18
å»
0.16
li
0.15
li
0.14
.li
0.14
ëįķ
0.14
Caval
0.14
endure
0.13
ossal
0.13
CustomAttributes
0.13
Activations Density 0.400%