INDEX
Explanations
equals sign
This neuron never fires on any tokens, so it isn’t detecting any consistent pattern.
New Auto-Interp
Negative Logits
sters
-0.07
tablet
-0.06
arrangement
-0.06
procedure
-0.06
Keywords
-0.06
obligated
-0.06
watts
-0.06
ύτε
-0.06
Monday
-0.06
Panc
-0.06
POSITIVE LOGITS
fillColor
0.06
ENTIC
0.06
december
0.06
ennifer
0.06
{↵0.06
republican
0.06
_CUDA
0.06
↵ ↵↵
0.06
ProgressBar
0.06
EGA
0.06
Activations Density 0.011%