INDEX
Explanations
phrases related to exceeding expectations or going the extra mile
New Auto-Interp
Negative Logits
itty
-0.20
jt
-0.15
296
-0.14
ANGER
-0.14
Äįel
-0.14
got
-0.14
close
-0.14
nak
-0.13
ostel
-0.13
inding
-0.13
POSITIVE LOGITS
beyond
0.41
Beyond
0.33
Beyond
0.30
extra
0.29
eyond
0.28
EXTRA
0.27
above
0.26
-extra
0.23
above
0.23
Above
0.23
Activations Density 0.042%