INDEX
Explanations
This neuron specifically detects occurrences of the word “deer.”
New Auto-Interp
Negative Logits
-0.07
/py
-0.07
Wire
-0.07
網
-0.06
فناوری
-0.06
_SM
-0.06
THON
-0.06
مصرف
-0.06
&S
-0.06
-holder
-0.06
POSITIVE LOGITS
deer
0.15
Deer
0.13
deer
0.09
trout
0.09
'('0.08
susceptible
0.07
Trout
0.07
鹿
0.06
divider
0.06
interchange
0.06
Activations Density 0.006%