INDEX
Explanations
This neuron specifically detects the token “Answer” (i.e. it activates on the label indicating the start of an answer).
New Auto-Interp
Negative Logits
الق
-0.06
songs
-0.06
Logic
-0.06
.NoArgsConstructor
-0.06
vouchers
-0.06
sóc
-0.06
eslint
-0.06
nhờ
-0.06
стра
-0.06
záznam
-0.06
POSITIVE LOGITS
网站
0.06
ップ
0.06
)?;↵↵
0.06
_def
0.06
∏
0.06
Tooltip
0.06
воду
0.06
enhanced
0.06
ucose
0.06
AND
0.06
Activations Density 0.008%