INDEX
Explanations
phrases related to political events and debates
topics related to political and social issues
New Auto-Interp
Negative Logits
?".
-0.77
?",
-0.74
doesnt
-0.73
%"
-0.73
)?
-0.72
?ãĢį
-0.72
}}}
-0.71
cffffcc
-0.71
"))
-0.71
)).
-0.71
POSITIVE LOGITS
plag
1.04
worn
0.84
wielded
0.83
emanating
0.83
belonging
0.81
undertaken
0.81
among
0.80
across
0.79
used
0.79
aboard
0.79
Activations Density 0.580%