INDEX
Explanations
Instances where the assistant self-identifies or gives a disclaimer about being an AI (the model's "As an AI ..." style preface).
New Auto-Interp
Negative Logits
ystone
-0.07
ung
-0.06
otre
-0.06
مق
-0.06
距
-0.06
vals
-0.06
Γεω
-0.06
repairs
-0.06
bilg
-0.06
飞
-0.06
POSITIVE LOGITS
(cv
0.06
Eth
0.06
\Service
0.06
_references
0.06
Eth
0.06
*(-
0.06
�
0.06
本
0.06
/:
0.06
ธ
0.06
Activations Density 0.062%