INDEX
Explanations
key phrases related to political support and endorsements
New Auto-Interp
Negative Logits
/the
-0.17
[]
-0.17
what
-0.15
rible
-0.15
innen
-0.14
/The
-0.14
ajar
-0.13
éĻ
-0.13
ijd
-0.13
ÅĻÃŃd
-0.13
POSITIVE LOGITS
same
0.31
own
0.31
latest
0.29
entire
0.27
latest
0.23
same
0.22
ability
0.22
biggest
0.21
original
0.20
newest
0.20
Activations Density 1.240%