INDEX
Explanations
mathematical expressions and symbols
New Auto-Interp
Negative Logits
McDon
-0.16
apos
-0.15
kek
-0.15
linear
-0.14
Parr
-0.13
ilities
-0.13
rek
-0.13
rieved
-0.13
erring
-0.13
opt
-0.13
POSITIVE LOGITS
olis
0.19
Frontier
0.18
onis
0.16
omanip
0.16
283
0.15
.jackson
0.15
egas
0.15
idon
0.15
Decompiled
0.15
aris
0.14
Activations Density 0.092%