INDEX
Explanations
mentions of political events and figures
references to political events or figures, particularly concerning campaigns or manifestos
New Auto-Interp
Negative Logits
aughs
-0.83
imer
-0.77
anus
-0.75
verages
-0.73
heses
-0.73
ndum
-0.73
witz
-0.72
gars
-0.72
ilion
-0.71
peria
-0.70
POSITIVE LOGITS
æ©
0.81
domain
0.72
domains
0.72
èĪ
0.70
)=(
0.67
ãģ®ç
0.66
entimes
0.65
strawberries
0.65
è£ıè
0.65
ãģ®å
0.64
Activations Density 0.137%