INDEX
Explanations
negative judgments or moral evaluations regarding actions and situations
New Auto-Interp
Negative Logits
warm
-0.17
ãģĭãĤı
-0.15
Warm
-0.15
اÙĨÙĪ
-0.15
wine
-0.15
loub
-0.14
vrd
-0.14
.GroupLayout
-0.14
omanip
-0.14
abbr
-0.14
POSITIVE LOGITS
fully
0.25
fulness
0.21
s
0.19
itude
0.18
wrong
0.17
/error
0.17
誤
0.17
fu
0.16
ful
0.16
omers
0.16
Activations Density 0.017%