INDEX
Explanations
occurrences of the prefix "un."
New Auto-Interp
Negative Logits
dı
-0.17
aft
-0.15
crast
-0.15
camp
-0.15
inner
-0.15
оне
-0.14
d
-0.14
gang
-0.14
dress
-0.14
cord
-0.14
POSITIVE LOGITS
iversal
0.23
tdown
0.22
iversit
0.22
iverse
0.21
iversity
0.21
ächst
0.21
ecessarily
0.20
erals
0.20
y
0.20
IVERS
0.19
Activations Density 0.070%