INDEX
Explanations
This neuron activates on the word “Taxonomy” (and its token pieces), i.e. labels or category headers indicating taxonomic classification.
New Auto-Interp
Negative Logits
going
-0.07
kker
-0.07
bakeca
-0.06
Mär
-0.06
द
-0.06
?>
-0.06
348
-0.06
<A
-0.06
drops
-0.06
ORN
-0.06
POSITIVE LOGITS
****************
0.07
ButterKnife
0.07
formerly
0.06
accumulating
0.06
Tahoe
0.06
επί
0.06
attach
0.06
vengeance
0.06
bullied
0.06
utenant
0.06
Activations Density 0.001%