INDEX
Explanations
phrases indicating comparison or contrast
New Auto-Interp
Negative Logits
rose
-0.17
one
-0.14
ÅĻÃŃzenÃŃ
-0.14
INET
-0.13
neau
-0.13
ernet
-0.13
eher
-0.13
noon
-0.13
Girlfriend
-0.13
PI
-0.13
POSITIVE LOGITS
iating
0.21
iates
0.20
iator
0.20
aland
0.19
/div
0.18
between
0.18
iable
0.17
iability
0.17
iators
0.16
nowrap
0.15
Activations Density 0.070%