INDEX
Explanations
words related to exaggeration and marketing themes
New Auto-Interp
Negative Logits
ãĤĴè¦ĭãĤĭ
-0.17
ozem
-0.16
â̦"↵↵
-0.14
â̦”↵↵
-0.13
::*
-0.13
())
-0.13
شاÙĩد
-0.13
CMD
-0.13
oulos
-0.13
ãĢıï¼Ī
-0.13
POSITIVE LOGITS
anyone
0.37
FT
0.34
anybody
0.32
indeed
0.29
?
0.28
ft
0.27
alert
0.27
=
0.27
Anyone
0.26
=yes
0.24
Activations Density 0.343%