INDEX
Explanations
phrases indicating comparisons or contrasts
New Auto-Interp
Negative Logits
idth
-0.15
imal
-0.15
ital
-0.14
_ATTRIBUTES
-0.14
apse
-0.14
eric
-0.14
utin
-0.13
ivar
-0.13
ubah
-0.13
unicode
-0.13
POSITIVE LOGITS
aeda
0.17
etto
0.17
otts
0.16
eker
0.16
unto
0.15
Sac
0.15
ekten
0.15
sac
0.15
THR
0.14
unken
0.14
Activations Density 0.010%