INDEX
Explanations
This neuron activates on negation tokens (e.g., “not,” “aren’t,” “isn’t”).
New Auto-Interp
Negative Logits
ायत
-0.07
Зап
-0.07
ській
-0.06
NotificationCenter
-0.06
BorderSide
-0.06
disponibles
-0.06
Ấn
-0.06
居
-0.06
_TWO
-0.06
iji
-0.06
POSITIVE LOGITS
Transformer
0.07
delet
0.07
_internal
0.07
kinson
0.07
prohibits
0.06
framework
0.06
_HAVE
0.06
â
0.06
quat
0.06
latent
0.06
Activations Density 0.010%