INDEX
Explanations
names of political figures and organizations
names and entities related to significant individuals and organizations
New Auto-Interp
Negative Logits
POV
-0.53
IBLE
-0.51
PAN
-0.50
om
-0.50
ctive
-0.50
HDD
-0.49
Pok
-0.48
correction
-0.47
ORY
-0.47
shift
-0.46
POSITIVE LOGITS
SPONSORED
0.71
respectively
0.68
interstitial
0.65
cffff
0.59
boycot
0.59
enshr
0.58
acus
0.58
akespeare
0.58
kefeller
0.57
reditary
0.55
Activations Density 1.555%