INDEX
Explanations
The neuron chiefly activates on words describing physical police actions (e.g. patting down, handcuffing, slamming, punching).
New Auto-Interp
Negative Logits
humili
-0.07
ocrisy
-0.07
土
-0.06
eca
-0.06
.AttributeSet
-0.06
пол
-0.06
될
-0.06
882
-0.06
hetto
-0.06
intervene
-0.06
POSITIVE LOGITS
Blast
0.07
=list
0.07
']=
0.07
creating
0.07
снова
0.06
-validate
0.06
VENTORY
0.06
_; ↵
0.06
Printing
0.06
ेहर
0.06
Activations Density 0.007%