INDEX
Explanations
The neuron activates on words and phrases that indicate the code is working correctly—e.g. “works,” “fine,” “perfectly,” “working,” etc.
New Auto-Interp
Negative Logits
необходимости
-0.06
استاندارد
-0.06
механіз
-0.06
подум
-0.06
دارم
-0.06
вспом
-0.06
孩子
-0.06
CONSTANT
-0.06
成
-0.06
berra
-0.06
POSITIVE LOGITS
-CN
0.07
Claude
0.07
Incontri
0.07
_basic
0.07
lında
0.06
fill
0.06
fills
0.06
Range
0.06
stdint
0.06
.HttpSession
0.06
Activations Density 0.008%