INDEX
Explanations
implementation
The neuron specifically fires on occurrences of the word “implementation” in code comments and documentation.
New Auto-Interp
Negative Logits
دوست
-0.06
sky
-0.06
Fox
-0.06
_read
-0.06
電視
-0.06
العربية
-0.06
اسر
-0.06
fluid
-0.06
Races
-0.06
rebels
-0.06
POSITIVE LOGITS
lineup
0.08
makta
0.07
程度
0.07
졌
0.07
ladık
0.07
ním
0.07
Harding
0.06
kont
0.06
出
0.06
reconc
0.06
Activations Density 0.008%