INDEX
Explanations
This neuron activates on mentions of integer variable declarations, especially the variable “x.”
New Auto-Interp
Negative Logits
аров
-0.07
sterdam
-0.07
Pets
-0.06
oll
-0.06
screamed
-0.06
llum
-0.06
ventory
-0.06
rief
-0.06
musician
-0.06
alsex
-0.06
POSITIVE LOGITS
věc
0.07
wicht
0.07
것이다
0.07
↵ ↵
0.07
↵
0.07
↵
0.07
0.07
↵
0.06
пат
0.06
0.06
Activations Density 0.080%