INDEX
Explanations
phrases indicating an opinion or assessment
New Auto-Interp
Negative Logits
ESE
-0.83
iership
-0.78
ests
-0.74
BAT
-0.71
ULTS
-0.70
utherland
-0.69
eful
-0.69
uers
-0.69
HAEL
-0.68
ACA
-0.66
POSITIVE LOGITS
messed
0.92
screwed
0.91
sucked
0.80
bum
0.80
neat
0.79
sucks
0.78
fucked
0.77
reminds
0.77
lame
0.77
creepy
0.75
Activations Density 0.058%