INDEX
Explanations
phrases indicating personal opinions and reflections
New Auto-Interp
Negative Logits
need
-0.16
YTE
-0.15
Hopefully
-0.15
atte
-0.15
presume
-0.14
env
-0.14
roscope
-0.14
zas
-0.14
ught
-0.14
NCY
-0.14
POSITIVE LOGITS
submit
0.21
personally
0.21
fail
0.19
smell
0.18
question
0.18
diag
0.18
submits
0.17
object
0.17
cr
0.17
challenge
0.17
Activations Density 0.169%