INDEX
Explanations
proper names or words related to people, likely politician or public figures
references to individuals and their relationships or actions
New Auto-Interp
Negative Logits
*/(
-0.82
Agents
-0.74
hold
-0.74
soDeliveryDate
-0.72
Cosponsors
-0.72
IAL
-0.72
message
-0.69
rab
-0.67
rants
-0.67
boards
-0.65
POSITIVE LOGITS
oli
1.41
ague
1.07
olini
0.98
veyard
0.93
zzo
0.92
omon
0.90
oglu
0.89
ppo
0.89
uca
0.88
ola
0.87
Activations Density 0.005%