INDEX
Explanations
The neuron fires on placeholder tokens that label character names (e.g. “NAME_1,” “NAME_2,” etc.).
New Auto-Interp
Negative Logits
mpz
-0.06
plat
-0.06
(Cs
-0.06
reflux
-0.06
paypal
-0.06
untu
-0.06
productService
-0.06
icmp
-0.06
Balt
-0.06
labelText
-0.06
POSITIVE LOGITS
сих
0.07
口
0.06
surrounds
0.06
hundred
0.06
waiting
0.06
ünde
0.06
-related
0.06
character
0.06
}".
0.06
yeah
0.06
Activations Density 0.019%