INDEX
Explanations
tokens that occur in instruction/task-setting prompts (imperative or role directives), i.e., words used when the user tells the model what to do.
New Auto-Interp
Negative Logits
Identity
-0.06
Date
-0.06
Qui
-0.06
515
-0.06
Uploader
-0.06
문의
-0.06
772
-0.06
전
-0.06
ificant
-0.06
bindung
-0.06
POSITIVE LOGITS
hint
0.07
/basic
0.07
*z
0.07
_START
0.06
_stdio
0.06
_SELECTOR
0.06
-notch
0.06
allies
0.06
_LABEL
0.06
ῆ
0.06
Activations Density 0.107%