INDEX
Explanations
references to political calls for action or dialogue
New Auto-Interp
Negative Logits
↵
-0.15
-↵
-0.14
ô
-0.14
¬
-0.14
[[
-0.13
ãģĹãģ¾
-0.13
наÑģÑĤ
-0.13
[[
-0.13
ħĮ
-0.13
ãģĵãģĿ
-0.12
POSITIVE LOGITS
‘
0.19
ï¿
0.16
'
0.16
%↵↵
0.15
ï¼ģï¼ģ↵↵
0.15
&#
0.15
intl
0.14
ï½ŀ↵↵
0.14
'&#
0.14
'(
0.14
Activations Density 0.914%