INDEX
Explanations
This neuron detects the substring “Inf” at the start of tokens (i.e. words beginning with “inf–”).
New Auto-Interp
Negative Logits
Mage
-0.08
Zack
-0.08
buckle
-0.07
Gong
-0.07
Jame
-0.07
Game
-0.07
Game
-0.07
****************************************
-0.07
Kate
-0.07
wakeup
-0.07
POSITIVE LOGITS
inf
0.14
Inf
0.13
Inf
0.11
inf
0.11
ाध
0.08
FI
0.08
INF
0.08
_INF
0.08
isn
0.07
inks
0.07
Activations Density 0.012%