INDEX
Explanations
conjunctions 'or' and 'and' signaling contrasting or additive relationships
phrases indicating moral comparisons between good and bad
New Auto-Interp
Negative Logits
ivari
-0.71
quit
-0.70
LEVEL
-0.68
veins
-0.68
igham
-0.66
brates
-0.66
ĸļ
-0.64
sbm
-0.64
igraph
-0.63
aturday
-0.63
POSITIVE LOGITS
evil
1.13
bad
1.09
evil
1.08
brightest
1.00
Evil
0.98
bad
0.89
Evil
0.86
wrong
0.84
evils
0.83
BAD
0.83
Activations Density 0.099%