INDEX
Explanations
references to political figures and their actions
New Auto-Interp
Negative Logits
Roosevelt
-0.18
Amerikan
-0.15
Roose
-0.15
scribed
-0.14
ุล
-0.14
otas
-0.14
753
-0.14
mund
-0.14
Kapoor
-0.14
Gotham
-0.14
POSITIVE LOGITS
Representative
0.25
Cond
0.23
Rah
0.23
Ann
0.20
Pat
0.20
Rand
0.20
Rep
0.20
Strom
0.20
Ari
0.20
Rush
0.19
Activations Density 0.272%