INDEX
Explanations
The neuron fires on direct second-person guidance phrases, especially the sequence “you are.”
New Auto-Interp
Negative Logits
Brigade
-0.07
feudal
-0.07
japan
-0.07
الم
-0.07
+z
-0.07
remove
-0.07
dön
-0.07
z
-0.06
Dealer
-0.06
Panic
-0.06
POSITIVE LOGITS
の上
0.06
všem
0.06
.Exception
0.06
güzel
0.06
обо
0.06
°}
0.06
�
0.06
onPress
0.06
všichni
0.06
geçen
0.06
Activations Density 0.020%