INDEX
Explanations
words related to the concept of 'not' or negation
New Auto-Interp
Negative Logits
es
-0.18
hart
-0.18
o
-0.16
yor
-0.15
eri
-0.15
dre
-0.15
itra
-0.14
MeasureSpec
-0.14
alf
-0.14
amo
-0.14
POSITIVE LOGITS
esseract
0.21
ting
0.19
aurus
0.19
rell
0.16
rench
0.15
ledge
0.15
ceans
0.15
swana
0.15
unnel
0.15
tery
0.15
Activations Density 0.068%