INDEX
Explanations
expressions of surprise or unexpected outcomes
New Auto-Interp
Negative Logits
otta
-0.14
ragment
-0.14
γγ
-0.13
ossa
-0.13
Fabric
-0.13
ertools
-0.13
UEST
-0.13
enos
-0.13
lems
-0.12
arda
-0.12
POSITIVE LOGITS
ekl
0.15
apus
0.15
Kauf
0.15
criptor
0.15
stype
0.15
ror
0.14
Guth
0.14
eko
0.14
bd
0.14
odesk
0.14
Activations Density 0.102%