INDEX
Explanations
punctuation
The neuron activates on first-person, self-referential words (I, my, we, etc.) indicating personal commentary.
New Auto-Interp
Negative Logits
CLUDED
-0.07
Et
-0.07
Mutation
-0.06
imulation
-0.06
Entertainment
-0.06
mans
-0.06
319
-0.06
)::
-0.06
종
-0.06
Nat
-0.06
POSITIVE LOGITS
ивают
0.07
Might
0.07
.LayoutStyle
0.06
,本
0.06
би
0.06
طلا
0.06
.Java
0.06
элем
0.06
代码
0.06
СО
0.05
Activations Density 0.099%