INDEX
Explanations
Disgusting texts
the main thing this neuron does is spot strongly negative, disgust-evoking adjectives (e.g. “revolting,” “disgusting,” “gross”)
New Auto-Interp
Negative Logits
buy
-0.07
stud
-0.07
map
-0.06
head
-0.06
Fit
-0.06
spr
-0.06
fontWeight
-0.06
filtr
-0.06
.xxx
-0.06
вар
-0.06
POSITIVE LOGITS
exhilar
0.06
_<
0.06
DPI
0.06
年の
0.06
iverz
0.06
.Grid
0.06
/article
0.06
)"↵
0.06
UBLIC
0.06
corev
0.06
Activations Density 0.132%