INDEX
Explanations
This neuron consistently activates on mentions of someone’s death or murder.
New Auto-Interp
Negative Logits
.textBox
-0.07
通知
-0.07
Jeho
-0.06
-0.06
calloc
-0.06
orch
-0.06
змож
-0.06
söz
-0.06
xlim
-0.06
宣
-0.06
POSITIVE LOGITS
ับต
0.07
ynchron
0.06
#
0.06
tensor
0.06
、↵
0.06
uper
0.06
AMD
0.06
Mix
0.06
forum
0.06
(with
0.06
Activations Density 0.036%