INDEX
Explanations
pro- prefixes related to advocacy or support for specific causes
New Auto-Interp
Negative Logits
p
-0.19
rix
-0.16
AREST
-0.15
c
-0.15
hei
-0.15
baz
-0.15
pic
-0.15
wagon
-0.15
wap
-0.14
pent
-0.14
POSITIVE LOGITS
tracted
0.23
actively
0.22
forma
0.22
verbs
0.20
pped
0.20
ffer
0.20
bon
0.20
-active
0.20
sth
0.19
strate
0.19
Activations Density 0.035%