INDEX
Explanations
occurrences of the word "un" or related prefixes indicating negation or absence
New Auto-Interp
Negative Logits
cem
-0.17
cen
-0.17
engin
-0.15
entlich
-0.15
camp
-0.14
okin
-0.14
gate
-0.14
den
-0.14
ucha
-0.14
aft
-0.14
POSITIVE LOGITS
erals
0.23
iversit
0.22
ghi
0.20
iverse
0.20
iversal
0.20
erable
0.19
eral
0.19
iversity
0.19
nel
0.19
lap
0.19
Activations Density 0.082%