INDEX
Explanations
intensifiers and modifiers that emphasize evaluations or descriptions
New Auto-Interp
Negative Logits
verted
-0.18
ially
-0.16
ulumi
-0.16
entiful
-0.15
iguous
-0.14
üyle
-0.14
proposal
-0.14
uzzle
-0.14
åĴ
-0.14
ughty
-0.14
POSITIVE LOGITS
sooner
0.20
fewer
0.20
glad
0.18
often
0.18
few
0.17
true
0.17
impressed
0.17
few
0.17
tempted
0.16
Few
0.16
Activations Density 0.157%