INDEX
Explanations
locations
This neuron fires on tokens describing a scene of someone seated (e.g. “bench,” “park”)—i.e. it detects when a character is sitting on a bench in a park.
New Auto-Interp
Negative Logits
Mes
-0.06
(wait
-0.06
principles
-0.06
Wonder
-0.06
captivity
-0.06
alkal
-0.06
explo
-0.06
Agility
-0.06
creasing
-0.06
_pres
-0.06
POSITIVE LOGITS
::_
0.07
_HW
0.06
しの
0.06
때
0.06
μει
0.06
stants
0.06
desks
0.06
> ↵ ↵ ↵
0.06
ندق
0.06
aside
0.06
Activations Density 0.048%