INDEX
Explanations
The neuron is detecting the appearance of the word “euphemisms,” i.e. euphemistic language.
New Auto-Interp
Negative Logits
PB
-0.07
ціл
-0.07
))↵
-0.06
문의
-0.06
thước
-0.06
-0.06
igits
-0.06
tbl
-0.06
-match
-0.06
.proto
-0.06
POSITIVE LOGITS
workplace
0.07
premium
0.07
trading
0.07
sanction
0.07
transformers
0.07
ngủ
0.06
Ernest
0.06
υκ
0.06
preparations
0.06
appropriations
0.06
Activations Density 0.001%