INDEX
Explanations
phrases indicating the need for help or assistance in a task
New Auto-Interp
Negative Logits
themselves
-0.21
yourselves
-0.17
mey
-0.16
iska
-0.16
idual
-0.16
гов
-0.15
ãĥ¼ãĥ¬
-0.15
apore
-0.15
Cousins
-0.15
itself
-0.14
POSITIVE LOGITS
myself
0.28
бÑĥдÑĥ
0.16
istiyorum
0.15
arken
0.14
strcasecmp
0.14
ga
0.14
opia
0.13
_ATTACK
0.13
dam
0.13
vind
0.13
Activations Density 1.919%