INDEX
Explanations
This neuron spots the closing quote‐and‐bracket sequence (“]”) that marks the end of the user’s placeholder for “your answer” in toxic‐speech instructions.
New Auto-Interp
Negative Logits
програми
-0.06
Musical
-0.06
Informe
-0.06
_gift
-0.06
десят
-0.06
upbeat
-0.06
another
-0.06
atar
-0.06
علت
-0.06
ường
-0.06
POSITIVE LOGITS
ольз
0.08
.toJson
0.06
]]
0.06
lose
0.06
千
0.06
okes
0.06
mA
0.06
.Left
0.06
FOREIGN
0.06
.left
0.06
Activations Density 0.000%