INDEX
Explanations
negative feelings
The neuron activates on language that describes a consuming or destructive takeover—words signaling something’s powerful, overwhelming impact (e.g. “toll,” “takes,” “take over,” “power”).
New Auto-Interp
Negative Logits
_on
-0.07
varios
-0.06
indx
-0.06
_xt
-0.06
нам
-0.06
x
-0.06
confusion
-0.06
milk
-0.06
cri
-0.06
-0.06
POSITIVE LOGITS
0.07
Opaque
0.06
TODAY
0.06
STD
0.06
اپ
0.06
Dragging
0.06
entionPolicy
0.06
ジ
0.06
_BOTTOM
0.06
progress
0.06
Activations Density 0.067%