INDEX
Explanations
phrases indicating surprise or unexpected outcomes
New Auto-Interp
Negative Logits
uc
-0.17
mos
-0.16
unc
-0.15
reu
-0.14
Rolled
-0.14
atie
-0.14
cooked
-0.14
Threads
-0.14
Wo
-0.13
пов
-0.13
POSITIVE LOGITS
Rosenstein
0.18
finity
0.15
ãĥ©ãĤ¯
0.15
oppos
0.14
:Register
0.14
getState
0.14
ynch
0.14
canf
0.14
bane
0.14
antioxid
0.14
Activations Density 0.029%