INDEX
Explanations
strings with a specific format including non-alphabetical characters
references to video games or gaming-related concepts
New Auto-Interp
Negative Logits
Dek
-0.82
Advantage
-0.75
tsky
-0.71
undermin
-0.70
scill
-0.68
disadvant
-0.68
unlaw
-0.67
Oo
-0.66
Tokens
-0.65
Rewards
-0.64
POSITIVE LOGITS
manager
0.82
info
0.74
editor
0.73
ping
0.72
server
0.71
@
0.71
json
0.70
database
0.70
rw
0.70
global
0.69
Activations Density 0.478%