INDEX
Explanations
comparative phrases indicating improvement or quality
New Auto-Interp
Negative Logits
antom
-0.15
bin
-0.15
Lon
-0.14
legates
-0.14
quets
-0.14
bin
-0.14
lie
-0.14
embers
-0.14
Ãľst
-0.14
957
-0.14
POSITIVE LOGITS
pread
0.18
arda
0.16
ado
0.16
odic
0.15
Mellon
0.15
ODEV
0.14
anza
0.14
opers
0.14
ikel
0.14
zac
0.14
Activations Density 0.047%