INDEX
Explanations
The neuron fires on words that denote exerting control or giving commands.
New Auto-Interp
Negative Logits
bet
-0.07
(news
-0.07
�
-0.06
noting
-0.06
Winners
-0.06
tặng
-0.06
BODY
-0.06
gı
-0.06
ه
-0.06
lunch
-0.06
POSITIVE LOGITS
overlapping
0.07
िसक
0.06
ござ
0.06
státu
0.06
牙
0.06
Compilation
0.06
.setStyleSheet
0.06
hierarchy
0.06
findById
0.06
ุรก
0.06
Activations Density 0.012%