INDEX
Explanations
phrases and names related to political figures and organizations
mentions of specific individuals, particularly politicians, and their associated contexts
New Auto-Interp
Negative Logits
kered
-0.83
uracy
-0.77
Petersburg
-0.75
laus
-0.75
umber
-0.70
esville
-0.70
nis
-0.69
oreal
-0.69
avid
-0.67
central
-0.65
POSITIVE LOGITS
wcs
0.86
Booker
0.76
Buffett
0.74
âĸ¬
0.69
cci
0.68
Vie
0.65
Prize
0.65
llo
0.64
Tau
0.62
ogue
0.60
Activations Density 0.036%