INDEX
Explanations
This neuron activates on occurrences of the word “private.”
New Auto-Interp
Negative Logits
Eck
-0.06
^[
-0.06
Beck
-0.06
Cox
-0.06
Transformer
-0.06
hỗ
-0.06
Thorn
-0.06
Egg
-0.06
Bow
-0.06
Oh
-0.06
POSITIVE LOGITS
private
0.15
Private
0.14
private
0.14
Private
0.12
private
0.12
PRIVATE
0.10
-private
0.09
(private
0.09
privately
0.08
.Private
0.08
Activations Density 0.026%