INDEX
Explanations
The neuron selectively fires on occurrences of the token “way.”
New Auto-Interp
Negative Logits
字符
-0.07
abyss
-0.07
κύ
-0.07
yum
-0.07
itals
-0.06
Adoles
-0.06
suggests
-0.06
하는데
-0.06
WWE
-0.06
Significant
-0.06
POSITIVE LOGITS
way
0.08
_Form
0.07
اخت
0.07
_written
0.07
�a
0.06
Sty
0.06
andatory
0.06
_OVERFLOW
0.06
Dy
0.06
editary
0.06
Activations Density 0.011%