INDEX
Explanations
Deflection, diversion, or lies
The neuron activates on instances of “from” used in contrast or diversion expressions (e.g. “different from,” “divert attention from,” “distract from the fact that”).
New Auto-Interp
Negative Logits
_Manager
-0.07
نا
-0.07
gons
-0.06
好像
-0.06
�
-0.06
iteral
-0.06
Israel
-0.06
Although
-0.06
ран
-0.06
rending
-0.06
POSITIVE LOGITS
Adventure
0.07
prog
0.07
BANK
0.06
Accom
0.06
flight
0.06
Trip
0.06
HAPP
0.06
fcn
0.06
predictions
0.06
close
0.06
Activations Density 0.017%