INDEX
Explanations
terms related to political controversies, such as WikiLeaks and copyright infringement
keywords related to politics and social issues
New Auto-Interp
Negative Logits
staggered
-0.65
[*
-0.64
nonetheless
-0.62
nevertheless
-0.61
leapt
-0.60
dotted
-0.60
Yose
-0.60
gently
-0.59
likewise
-0.59
sufficient
-0.59
POSITIVE LOGITS
Tags
1.05
olitics
0.97
politics
0.97
ategor
0.92
<|endoftext|>
0.90
Politics
0.83
uality
0.82
americ
0.81
·
0.80
,...
0.78
Activations Density 0.129%