INDEX
Explanations
greetings and friendly salutations in conversations
New Auto-Interp
Negative Logits
ingles
-0.13
------+------+
-0.13
aptic
-0.13
ire
-0.13
eor
-0.13
297
-0.13
ICO
-0.13
ccount
-0.13
zier
-0.13
cept
-0.13
POSITIVE LOGITS
everyone
0.28
everybody
0.26
*,↵
0.24
,↵↵
0.23
Everyone
0.23
everyone
0.21
Everyone
0.19
ØĮ↵
0.19
again
0.18
Everybody
0.18
Activations Density 0.030%