INDEX
Explanations
The neuron activates on words expressing emotional or physical closeness (e.g. “close,” “closeness”) indicating an intimate bond between characters.
New Auto-Interp
Negative Logits
free
-0.06
несп
-0.06
Tmp
-0.06
shore
-0.06
Haut
-0.06
cortisol
-0.06
aime
-0.05
"](
-0.05
insults
-0.05
deix
-0.05
POSITIVE LOGITS
zew
0.07
ROT
0.07
STATIC
0.06
pens
0.06
imming
0.06
skys
0.06
Teams
0.06
학년
0.06
verge
0.06
HOME
0.06
Activations Density 0.012%