INDEX
Explanations
assistant
tokens that mark structured conversational format elements such as speaker roles (user, assistant) and response channel selections (analysis, final, commentary).
New Auto-Interp
Negative Logits
PV
-0.12
RM
-0.12
AE
-0.11
FV
-0.11
PV
-0.11
NM
-0.11
RM
-0.11
BM
-0.11
Merc
-0.11
NM
-0.10
POSITIVE LOGITS
<|channel|>
0.41
<|message|>
0.32
<|constrain|>
0.28
<|start|>
0.18
婷婷
0.16
�
0.16
<|end|>
0.16
<|call|>
0.15
琪琪
0.15
ASUS
0.15
Activations Density 0.591%