INDEX
Explanations
special conversation/formatting tokens and metadata markers (role tags like "user"/"assistant" and header/end-of-text markers).
New Auto-Interp
Negative Logits
_("-0.07
stereotype
-0.07
fp
-0.07
цу
-0.06
Вона
-0.06
awah
-0.06
Key
-0.06
.Α
-0.06
swirl
-0.06
fino
-0.06
POSITIVE LOGITS
_LEAVE
0.07
_BUS
0.07
.userInteractionEnabled
0.06
applaud
0.06
ynchronize
0.06
guarding
0.06
DOCTYPE
0.06
~↵↵
0.06
Chim
0.06
Mant
0.06
Activations Density 0.028%