INDEX
Explanations
exaggeratedly positive or negative sentiments
intensifiers or adverbs expressing strong opinions or feelings
New Auto-Interp
Negative Logits
antry
-0.81
Previously
-0.68
icipated
-0.68
amide
-0.63
Altern
-0.63
iem
-0.62
Previously
-0.61
udic
-0.61
ttes
-0.60
artment
-0.59
POSITIVE LOGITS
neat
0.90
nice
0.86
messed
0.86
shitty
0.83
cool
0.83
pissed
0.81
crappy
0.81
nice
0.77
freaking
0.76
bad
0.76
Activations Density 0.072%