INDEX
Explanations
phrases emphasizing intense qualities, such as 'such a' or 'of a'
expressions indicating strong emphasis or degree
New Auto-Interp
Negative Logits
endi
-0.75
ids
-0.75
seys
-0.74
aired
-0.70
onds
-0.69
rones
-0.65
pta
-0.64
urb
-0.63
reys
-0.63
earance
-0.63
POSITIVE LOGITS
fun
1.02
hassle
0.94
nightmare
0.87
fodder
0.83
gamble
0.83
annoyance
0.83
embarrassment
0.82
cule
0.82
nuisance
0.82
sleeper
0.78
Activations Density 0.069%