INDEX
Explanations
The neuron is primarily detecting occurrences of the word “comfortable.”
New Auto-Interp
Negative Logits
ショ
-0.07
об
-0.07
ленно
-0.06
byli
-0.06
集中
-0.06
协议
-0.06
_mat
-0.06
decoder
-0.06
Stick
-0.06
tor
-0.06
POSITIVE LOGITS
sucked
0.07
.concatenate
0.07
"](
0.07
.optimizer
0.06
extracting
0.06
lus
0.06
0.06
Carolina
0.06
limp
0.06
funded
0.06
Activations Density 0.007%