INDEX
Explanations
words expressing a lack or absence, often associated with negativity or superfluousness
New Auto-Interp
Negative Logits
ìĦľ
-0.20
zelf
-0.18
ization
-0.18
ity
-0.17
ÑģÑĮ
-0.17
_UNUSED
-0.16
ãĥ¼
-0.16
ISED
-0.16
avir
-0.16
ever
-0.15
POSITIVE LOGITS
ness
0.42
nes
0.38
NESS
0.30
lessly
0.24
/un
0.23
ened
0.23
ening
0.22
ingly
0.21
es
0.21
wonder
0.21
Activations Density 0.048%