INDEX
Explanations
regarding or concerning
It detects tokens that mark or begin the model/assistant's responses—i.e., words frequently used at the start of assistant utterances.
New Auto-Interp
Negative Logits
把它
0.38
ಣಿ
0.34
intuitive
0.33
quint
0.33
这是
0.33
ന്
0.32
ভেবে
0.32
𝖑
0.32
ultimate
0.32
bằng
0.31
POSITIVE LOGITS
Regarding
1.94
Regarding
1.91
regarding
1.88
regarding
1.68
Concerning
1.63
至于
1.61
Concerning
1.59
concernant
1.49
щодо
1.44
至於
1.44
Activations Density 0.010%