INDEX
Explanations
adjectives describing intensity or quality
phrases expressing strong emotions or sentiments
New Auto-Interp
Negative Logits
ARM
-0.71
kick
-0.68
ertodd
-0.66
Lobby
-0.63
olate
-0.63
orial
-0.63
ntil
-0.61
achus
-0.61
EP
-0.61
Drum
-0.59
POSITIVE LOGITS
ties
0.76
ities
0.67
abundantly
0.65
atan
0.61
specificity
0.61
tons
0.60
things
0.60
thing
0.60
Takeru
0.59
minded
0.58
Activations Density 0.052%