INDEX
Explanations
Reasons or explanations
The neuron primarily activates on negative modal constructions (especially “can’t” or similar prohibitions) indicating that the user is unable or forbidden to do something.
New Auto-Interp
Negative Logits
Orange
-0.07
آباد
-0.07
uParam
-0.07
силу
-0.07
syscall
-0.07
Trouble
-0.07
OutOfRangeException
-0.07
チュ
-0.06
creator
-0.06
آسی
-0.06
POSITIVE LOGITS
Loans
0.07
conditional
0.06
(first
0.06
359
0.06
кам
0.06
cock
0.06
rowave
0.06
็ต
0.06
enders
0.06
Α
0.06
Activations Density 0.094%