INDEX
Explanations
security
The main thing this neuron does is detect mentions of security or protective personnel and measures.
New Auto-Interp
Negative Logits
외국
-0.07
义
-0.06
artisan
-0.06
yaygın
-0.06
pz
-0.06
نرم
-0.06
увався
-0.06
glBind
-0.06
ún
-0.06
Toyota
-0.06
POSITIVE LOGITS
nav
0.07
SECURITY
0.07
narcotics
0.07
security
0.06
?'
0.06
Ashley
0.06
csi
0.06
shrink
0.06
体育
0.06
<Comment
0.06
Activations Density 0.028%