INDEX
Explanations
The neuron specifically detects the presence of the choice label “C.”
phrases related to interpersonal relationships and social behaviors.
New Auto-Interp
Negative Logits
values
-0.07
ndef
-0.06
.street
-0.06
partic
-0.06
valid
-0.06
..'
-0.06
survives
-0.06
struct
-0.06
524
-0.06
productos
-0.06
POSITIVE LOGITS
擦
0.07
羊
0.06
Cunningham
0.06
categorical
0.06
Xuân
0.06
basket
0.06
проблемы
0.06
світ
0.06
інт
0.06
Sox
0.06
Activations Density 0.001%