INDEX
Explanations
The main thing this neuron does is detect occurrences of the word “widget.”
New Auto-Interp
Negative Logits
Morse
-0.08
496
-0.07
-distance
-0.07
ore
-0.07
crossings
-0.07
eneral
-0.07
17
-0.07
18
-0.07
16
-0.07
över
-0.06
POSITIVE LOGITS
widget
0.09
Widget
0.08
Widget
0.07
Hayden
0.07
gadget
0.07
veled
0.07
widget
0.07
Featured
0.07
ода
0.07
ут
0.07
Activations Density 0.005%