INDEX
Explanations
The neuron fires on occurrences of the word “variant.”
New Auto-Interp
Negative Logits
Lee
-0.07
bio
-0.07
Wy
-0.07
kHz
-0.07
Bio
-0.07
hello
-0.07
Lee
-0.07
Gro
-0.06
ro
-0.06
xy
-0.06
POSITIVE LOGITS
ant
0.13
variant
0.12
ANT
0.12
ulant
0.10
mutant
0.10
ent
0.09
ант
0.09
vant
0.09
nant
0.09
quent
0.09
Activations Density 0.047%