INDEX
Explanations
work and hard
This neuron responds to mentions of “hard work” (and similar motivational work-and-effort phrases).
New Auto-Interp
Negative Logits
ють
-0.07
Muslim
-0.06
editor
-0.06
indie
-0.06
=[↵
-0.06
melted
-0.06
egrated
-0.06
collapsed
-0.06
q
-0.06
يانة
-0.05
POSITIVE LOGITS
congrat
0.07
enever
0.07
. ↵↵
0.07
ــــ
0.07
Yatırım
0.06
abei
0.06
):↵↵
0.06
abox
0.06
dve
0.06
_dot
0.06
Activations Density 0.015%