INDEX
Explanations
This neuron activates on the definite article “the.”
New Auto-Interp
Negative Logits
lập
-0.07
zier
-0.07
_UNIT
-0.07
queries
-0.06
-AA
-0.06
ندية
-0.06
arna
-0.06
.f
-0.06
aging
-0.06
า
-0.06
POSITIVE LOGITS
wave
0.06
unsett
0.06
transient
0.06
(){
↵
↵0.06
सरक
0.06
insanely
0.06
acerb
0.06
Prompt
0.06
osti
0.06
راه
0.05
Activations Density 0.003%