INDEX
Explanations
code-related terms, particularly those involving numbers
self-referential pronouns
New Auto-Interp
Negative Logits
add
-0.54
med
-0.54
alt
-0.53
zz
-0.52
ch
-0.51
v
-0.51
aux
-0.51
dat
-0.50
nier
-0.50
c
-0.50
POSITIVE LOGITS
itſelf
0.96
myſelf
0.95
AndEndTag
0.94
raiſ
0.85
rrggbb
0.83
Monfieur
0.83
amaño
0.82
ſever
0.82
juſt
0.81
faſt
0.81
Activations Density 3.562%