INDEX
Explanations
discrepancies between stated intentions and actual outcomes in various contexts
New Auto-Interp
Negative Logits
isks
-0.19
;element
-0.17
ibbon
-0.16
esini
-0.16
essim
-0.16
поба
-0.16
otton
-0.15
isque
-0.15
ottom
-0.15
HeaderCode
-0.14
POSITIVE LOGITS
reality
0.26
Reality
0.20
Reality
0.19
realities
0.18
Bur
0.17
121
0.16
216
0.15
cheap
0.14
ety
0.14
realidad
0.14
Activations Density 0.145%