INDEX
Explanations
specific
The neuron activates on words related to personalization or individual‐specific context.
New Auto-Interp
Negative Logits
millennium
-0.07
_dimension
-0.07
.Focused
-0.07
.TextEdit
-0.07
sand
-0.07
ρθ
-0.06
ød
-0.06
PointerType
-0.06
.sys
-0.06
specular
-0.06
POSITIVE LOGITS
Ratio
0.07
โ
0.07
милли
0.07
philosoph
0.06
caract
0.06
itness
0.06
raries
0.06
lesser
0.06
Pil
0.06
://"
0.06
Activations Density 0.016%