INDEX
Explanations
This neuron detects occurrences of the word “insider” (and its variants) in the text.
New Auto-Interp
Negative Logits
Garcia
-0.07
jose
-0.07
کمک
-0.07
Buildings
-0.07
giám
-0.07
empleado
-0.06
ERP
-0.06
rowning
-0.06
_attempts
-0.06
Linda
-0.06
POSITIVE LOGITS
Insider
0.10
insider
0.09
insiders
0.09
outsider
0.07
.Hidden
0.06
Back
0.06
-sur
0.06
inject
0.06
hammer
0.06
如下
0.06
Activations Density 0.002%