INDEX
Explanations
references to political figures and their actions
New Auto-Interp
Negative Logits
ernaut
-0.17
.Invariant
-0.15
Unchecked
-0.15
pto
-0.14
dar
-0.14
umper
-0.14
дав
-0.14
asher
-0.14
fte
-0.14
åŁŁ
-0.14
POSITIVE LOGITS
vote
0.40
motion
0.40
motions
0.37
votes
0.35
Motion
0.33
motion
0.32
Motion
0.31
roll
0.30
floor
0.29
Vote
0.28
Activations Density 0.086%