INDEX
Explanations
the prefix "un-" indicating negation or reversal of meaning
New Auto-Interp
Negative Logits
rome
-0.16
à¤Ĭ
-0.16
à¤Ĥपर
-0.15
692
-0.15
éry
-0.14
falls
-0.14
unfinished
-0.14
rypted
-0.14
dech
-0.14
elho
-0.14
POSITIVE LOGITS
Wind
0.19
Wind
0.19
wind
0.19
mask
0.18
winding
0.18
ear
0.18
wind
0.17
æīİ
0.17
hook
0.17
HOOK
0.16
Activations Density 0.022%