INDEX
Explanations
phrases that express varying degrees of uncertainty or slightness
New Auto-Interp
Negative Logits
opup
-0.16
little
-0.16
bardzo
-0.15
illos
-0.15
vraiment
-0.15
seemingly
-0.14
entirely
-0.14
totally
-0.14
very
-0.14
absolutely
-0.14
POSITIVE LOGITS
/stdc
0.19
.ly
0.19
ingly
0.19
æħ
0.18
different
0.18
TOO
0.17
like
0.17
586
0.17
umen
0.16
different
0.16
Activations Density 0.050%