INDEX
Explanations
dialogue and expressions of agreement or acknowledgment
New Auto-Interp
Negative Logits
je
-0.15
owler
-0.15
reason
-0.14
buzz
-0.14
ream
-0.14
Ŀ
-0.14
ins
-0.13
боÑĤ
-0.13
cush
-0.13
Snape
-0.13
POSITIVE LOGITS
reply
0.24
Reply
0.21
replies
0.20
reply
0.20
egg
0.19
.reply
0.19
Reply
0.18
ÑĪа
0.18
replied
0.18
_reply
0.17
Activations Density 0.563%