INDEX
Explanations
expressions of humor or sarcasm
New Auto-Interp
Negative Logits
contr
-0.17
oken
-0.15
apult
-0.15
Slo
-0.15
odka
-0.14
clr
-0.14
ĮĢ
-0.14
anela
-0.14
нок
-0.14
Jeg
-0.13
POSITIVE LOGITS
æ£
0.17
here
0.17
otics
0.17
bitset
0.17
çĴĥ
0.16
_HERE
0.15
udder
0.15
*this
0.15
здеÑģÑĮ
0.15
ÑĸлÑĮ
0.14
Activations Density 0.205%