INDEX
Explanations
expressions of certainty and understanding in discussions
New Auto-Interp
Negative Logits
uraa
-0.18
ENAME
-0.17
lero
-0.16
Bec
-0.15
orage
-0.15
ertext
-0.14
bearer
-0.14
_bridge
-0.14
ÃĸL
-0.14
vere
-0.14
POSITIVE LOGITS
knows
0.17
.experimental
0.17
urgeon
0.17
Permanent
0.16
çŁ¥éģĵ
0.15
mpr
0.15
ัà¸į
0.15
ahn
0.15
know
0.15
permanent
0.14
Activations Density 0.176%