INDEX
Explanations
The neuron detects mentions of awards or honors and related superlative descriptions (e.g., “the award is the highest honor given…”).
New Auto-Interp
Negative Logits
Detector
-0.08
ollider
-0.07
وات
-0.07
ladatel
-0.06
нем
-0.06
둥
-0.06
pios
-0.06
contradictory
-0.06
ัว
-0.06
(reverse
-0.06
POSITIVE LOGITS
ATERIAL
0.07
cro
0.06
406
0.06
21
0.06
miss
0.06
GOOD
0.06
영화
0.06
weed
0.06
errs
0.06
MOM
0.06
Activations Density 0.024%