INDEX
Explanations
This neuron primarily activates on occurrences of the word “Sheriff.”
New Auto-Interp
Negative Logits
Nodes
-0.07
+A
-0.07
Great
-0.07
Nicola
-0.07
Intelligent
-0.07
nodes
-0.06
Elite
-0.06
plug
-0.06
simples
-0.06
zeal
-0.06
POSITIVE LOGITS
sheriff
0.12
Sheriff
0.11
sher
0.08
rhs
0.07
рис
0.07
guardian
0.07
if
0.07
lf
0.07
svp
0.06
193
0.06
Activations Density 0.002%