INDEX
Explanations
phrases indicating satisfaction, approval, or emphasis
intensifiers expressing strong positive sentiments
New Auto-Interp
Negative Logits
antry
-0.89
ansas
-0.80
heid
-0.78
arthed
-0.67
Receiver
-0.67
lessly
-0.67
arian
-0.66
adelphia
-0.66
icipated
-0.66
anwhile
-0.65
POSITIVE LOGITS
appreciated
0.77
liked
0.76
pissed
0.72
messed
0.72
freaking
0.72
darn
0.70
fucking
0.70
fuckin
0.70
nice
0.69
FTWARE
0.69
Activations Density 0.056%