INDEX
Explanations
instances of political criticism and the related promises made by politicians
New Auto-Interp
Negative Logits
Rex
-0.16
åħĭæĸ¯
-0.15
yne
-0.15
raman
-0.14
iments
-0.14
engu
-0.14
aise
-0.14
Äħ
-0.14
INTERFACE
-0.14
bury
-0.13
POSITIVE LOGITS
icht
0.16
supposed
0.16
errick
0.15
#'
0.14
functools
0.14
inger
0.14
odos
0.14
ÑĤон
0.14
æĸ¹
0.14
hyp
0.14
Activations Density 0.259%