INDEX
Explanations
negative descriptors related to cruelty and inhumanity
New Auto-Interp
Negative Logits
enu
-0.17
timid
-0.15
isible
-0.14
Trim
-0.14
modest
-0.14
790
-0.14
onu
-0.14
ksi
-0.14
Quiet
-0.14
innoc
-0.14
POSITIVE LOGITS
cold
0.45
cold
0.36
Cold
0.34
mean
0.34
call
0.33
Cold
0.33
colder
0.32
indifference
0.32
indifferent
0.32
insensitive
0.32
Activations Density 0.499%