INDEX
Explanations
This neuron activates on words from the phrase “get back on track,” i.e. tokens like “get,” “back,” “on,” and “track.”
New Auto-Interp
Negative Logits
Num
-0.06
_HISTORY
-0.06
_car
-0.06
_agent
-0.06
pap
-0.06
SPE
-0.06
franch
-0.06
Zust
-0.06
ZD
-0.06
<small
-0.06
POSITIVE LOGITS
ịp
0.07
την
0.07
bilder
0.06
hely
0.06
entr
0.06
(aa
0.06
出版
0.06
edm
0.06
.addRow
0.06
touches
0.06
Activations Density 0.019%