INDEX
Explanations
emojis and exclamations
markers of heightened expressiveness and chat turn boundaries, such as exclamatory punctuation, emojis, and end-of-turn tokens.
New Auto-Interp
Negative Logits
M
0.68
ни
0.67
ும்
0.64
V
0.61
ﺮ
0.61
не
0.61
ला
0.59
εια
0.58
T
0.58
B
0.57
POSITIVE LOGITS
👋
0.57
The
0.54
😉
0.53
é
0.53
😀
0.52
You
0.50
↵
0.50
0.50
Ве
0.48
🙌
0.48
Activations Density 0.462%