INDEX
Explanations
terms related to advocacy and support for various causes
New Auto-Interp
Negative Logits
ald
-0.18
idar
-0.17
icot
-0.15
ks
-0.15
IFY
-0.15
xygen
-0.15
缮
-0.15
amy
-0.14
iph
-0.14
asd
-0.14
POSITIVE LOGITS
against
0.18
ise
0.17
ilon
0.17
Against
0.16
elli
0.14
ously
0.14
rou
0.14
loh
0.14
ur
0.14
gegen
0.14
Activations Density 0.020%