INDEX
Explanations
phrases indicating correctness or suitability
New Auto-Interp
Negative Logits
isoft
-0.16
Bald
-0.15
deen
-0.15
iliz
-0.14
oris
-0.14
Beled
-0.14
éal
-0.14
_TYP
-0.14
Bucc
-0.14
icap
-0.14
POSITIVE LOGITS
ilk
0.17
arch
0.15
Seal
0.15
choice
0.14
kind
0.14
refin
0.14
lies
0.14
unsch
0.14
¦
0.14
lev
0.14
Activations Density 0.029%