INDEX
Explanations
expressions of dislike or hatred
New Auto-Interp
Negative Logits
webElementGuid
-0.84
zzleHttp
-0.75
DoubleQuotes
-0.71
lende
-0.70
doInBackground
-0.68
nahilalakip
-0.68
ddelweddau
-0.67
URLException
-0.65
Viitteet
-0.65
Tracce
-0.65
POSITIVE LOGITS
hate
2.99
Hate
2.56
Hate
2.50
hate
2.47
hates
2.40
HATE
2.38
hated
2.12
hating
2.10
hatred
2.07
hateful
1.76
Activations Density 0.054%