INDEX
Explanations
negations or phrases indicating a lack of something
New Auto-Interp
Negative Logits
inand
-0.17
aka
-0.15
Unsupported
-0.15
lew
-0.15
undi
-0.14
determination
-0.13
iew
-0.13
enko
-0.13
anka
-0.13
aston
-0.13
POSITIVE LOGITS
sure
0.28
anymore
0.27
necessarily
0.27
bud
0.26
phased
0.26
allowed
0.25
anywhere
0.25
exactly
0.24
interested
0.24
bothered
0.23
Activations Density 0.122%