INDEX
Explanations
comparative phrases emphasizing superiority or preference
New Auto-Interp
Negative Logits
827
-0.17
fty
-0.15
Ekon
-0.15
umen
-0.14
istrate
-0.14
urai
-0.14
ëĺIJíķľ
-0.14
ãĤ§
-0.14
lette
-0.13
strar
-0.13
POSITIVE LOGITS
udios
0.15
aja
0.15
udio
0.14
any
0.14
ever
0.14
\Common
0.14
Ĥ¹
0.14
å»
0.13
mos
0.13
.any
0.13
Activations Density 0.069%