INDEX
Explanations
This neuron activates on social media user mentions (the “@username” tokens).
New Auto-Interp
Negative Logits
XV
-0.07
י
-0.07
.Note
-0.06
์น
-0.06
acest
-0.06
trap
-0.06
Shader
-0.06
uilt
-0.06
失败
-0.06
barrier
-0.06
POSITIVE LOGITS
//
0.07
settles
0.07
):-
0.07
أك
0.06
pedia
0.06
.vx
0.06
.@
0.06
@
0.06
Müz
0.06
.angular
0.06
Activations Density 0.005%