INDEX
Explanations
The neuron selectively activates on the “Lie” action tags (the “(Lie …)” markers) in the dialogue.
New Auto-Interp
Negative Logits
parsed
-0.08
_Internal
-0.07
Terrace
-0.07
clared
-0.07
Schwe
-0.07
Martin
-0.07
quel
-0.06
roomId
-0.06
715
-0.06
worked
-0.06
POSITIVE LOGITS
DataSource
0.06
_aff
0.06
lobbyists
0.06
RPG
0.06
ưỡng
0.06
Living
0.06
Initialize
0.06
δεν
0.06
ENSE
0.06
[data
0.06
Activations Density 0.004%