INDEX
Explanations
words indicating proactive behavior or actions
New Auto-Interp
Negative Logits
ujet
-0.20
bial
-0.19
rub
-0.18
bote
-0.17
uable
-0.16
bil
-0.16
midi
-0.16
rne
-0.16
bis
-0.16
b
-0.16
POSITIVE LOGITS
tracted
0.29
ffer
0.28
actively
0.27
pped
0.26
ponent
0.25
logue
0.25
verbs
0.24
wl
0.23
ccess
0.23
gres
0.23
Activations Density 0.008%