INDEX
Explanations
negative descriptors related to severity and cruelty
New Auto-Interp
Negative Logits
OrFail
-0.18
ONSE
-0.17
zier
-0.15
368
-0.15
ypse
-0.15
ový
-0.14
Skyl
-0.14
tÃŃ
-0.14
ossal
-0.14
OrCreate
-0.14
POSITIVE LOGITS
vard
0.21
-hard
0.14
erner
0.14
Tough
0.14
dre
0.14
ned
0.14
-cookie
0.14
ening
0.13
.selector
0.13
sock
0.13
Activations Density 0.060%