INDEX
Explanations
describing degree of feeling
New Auto-Interp
Negative Logits
stupid
0.56
stupidity
0.51
filth
0.49
outrageous
0.49
falsch
0.49
shitty
0.46
पागल
0.43
Stupid
0.42
错了
0.41
гря
0.41
POSITIVE LOGITS
somewhat
1.66
Somewhat
1.31
quite
1.30
rather
1.28
bastante
1.23
довольно
1.23
agak
1.22
有點
1.16
piuttosto
1.15
좀
1.13
Activations Density 0.073%