INDEX
Explanations
expressions of dislike or frustration towards jobs, styles, and social media interactions
expressing hate
New Auto-Interp
Negative Logits
contentLoaded
-0.53
IsMutable
-0.51
InstrumentedTest
-0.45
kurtka
-0.44
profitably
-0.44
SaveChangesAsync
-0.44
$_"
-0.43
verständlich
-0.43
RTEX
-0.42
Qaraldi
-0.41
POSITIVE LOGITS
hated
0.68
HATE
0.62
hates
0.61
hate
0.60
dislike
0.59
disliked
0.58
unhappy
0.53
unpleasant
0.50
worst
0.50
annoying
0.50
Activations Density 0.021%