INDEX
Explanations
phrases indicating action or obligation
New Auto-Interp
Negative Logits
ELLOW
-0.16
ÙĨس
-0.15
305
-0.15
åī²
-0.14
mdp
-0.14
byn
-0.14
ummer
-0.14
aeper
-0.14
kans
-0.13
nok
-0.13
POSITIVE LOGITS
adies
0.17
FTA
0.15
èĮ
0.15
å¼ĺ
0.14
_handlers
0.14
ubyte
0.14
Brothers
0.14
Hung
0.14
LLL
0.14
FETCH
0.14
Activations Density 0.046%