INDEX
Explanations
references to political figures and entities
words related to specific individuals or entities
New Auto-Interp
Negative Logits
channelAvailability
-0.64
SHIP
-0.62
bourg
-0.60
confines
-0.60
CPC
-0.57
STATS
-0.55
resses
-0.55
spat
-0.54
goats
-0.54
azes
-0.54
POSITIVE LOGITS
Wan
0.91
leck
0.85
ratulations
0.81
worldly
0.80
wald
0.76
uary
0.71
aida
0.70
yssey
0.69
llor
0.67
untu
0.66
Activations Density 0.090%