INDEX
Explanations
phrases related to political discussions and positions
special characters or formatting elements in the text
New Auto-Interp
Negative Logits
tremend
-0.75
Moonlight
-0.65
aimon
-0.65
bable
-0.61
conclud
-0.60
paio
-0.59
Samar
-0.59
Crusher
-0.58
Burgess
-0.58
Leopard
-0.57
POSITIVE LOGITS
ł
0.86
¹
0.76
ı
0.75
Į
0.70
į
0.70
Ķ
0.69
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
0.69
¼
0.67
º
0.67
¡
0.66
Activations Density 0.275%