INDEX
Explanations
references to political discourse and its implications
New Auto-Interp
Negative Logits
allenge
-0.15
agn
-0.15
nnen
-0.15
sko
-0.15
Demir
-0.14
izons
-0.14
oser
-0.14
oot
-0.14
Nose
-0.13
Hubb
-0.13
POSITIVE LOGITS
talk
0.17
-talk
0.16
Claims
0.16
headlines
0.15
lately
0.15
discussions
0.15
talk
0.15
discussion
0.14
Talk
0.14
Talk
0.14
Activations Density 0.289%