INDEX
Explanations
phrases or names related to political figures and events
mentions of specific individuals, particularly political figures
New Auto-Interp
Negative Logits
istical
-0.75
imates
-0.73
ORE
-0.65
uate
-0.65
Grateful
-0.64
Sorcerer
-0.63
Metatron
-0.62
Mirror
-0.62
CoC
-0.61
redit
-0.61
POSITIVE LOGITS
lette
0.97
Rouse
0.93
Rousse
0.91
stal
0.88
ff
0.87
lin
0.80
LB
0.77
cia
0.77
ĸļ
0.76
utics
0.76
Activations Density 0.009%