INDEX
Explanations
code/quotes/conversation
The neuron responds to mentions of “respecting other’s beliefs,” i.e. phrases about respecting others’ beliefs.
New Auto-Interp
Negative Logits
authentication
-0.07
discriminator
-0.07
_instructions
-0.07
Networking
-0.06
#
-0.06
++↵
-0.06
.kr
-0.06
Flight
-0.06
sep
-0.06
stk
-0.06
POSITIVE LOGITS
Incident
0.06
danh
0.06
Deadpool
0.06
شر
0.06
_LO
0.06
initely
0.06
.Last
0.06
kommun
0.06
NF
0.06
آ
0.06
Activations Density 0.001%