INDEX
Explanations
occurrences of the prefix "uns," indicating negation or absence
New Auto-Interp
Negative Logits
zin
-0.09
zan
-0.08
ervo
-0.08
etic
-0.08
SYNC
-0.07
baÅŁ
-0.07
zd
-0.07
hop
-0.07
hs
-0.07
aver
-0.07
POSITIVE LOGITS
uns
0.08
Uns
0.08
d
0.08
utom
0.07
ar
0.07
y
0.06
ا
0.06
paring
0.06
air
0.06
ward
0.06
Activations Density 0.006%