INDEX
Explanations
Facebook software/AI usage
instances where the assistant refers to itself as an AI (e.g., "As an AI language model").
New Auto-Interp
Negative Logits
sten
-0.07
_swap
-0.06
люч
-0.06
иц
-0.06
etty
-0.06
userid
-0.06
protestors
-0.06
demonstrates
-0.06
зации
-0.06
Username
-0.06
POSITIVE LOGITS
arbit
0.07
нівер
0.07
获得
0.07
универ
0.07
Turning
0.07
odon
0.06
.Word
0.06
0.06
etchup
0.06
//////////
0.06
Activations Density 0.018%