INDEX
Explanations
descriptions related to news or current events
references to specific political figures and related events
New Auto-Interp
Negative Logits
@#&
-0.81
politics
-0.70
learn
-0.69
profits
-0.69
needed
-0.67
ventions
-0.66
manufact
-0.65
idays
-0.65
Prem
-0.64
Pwr
-0.63
POSITIVE LOGITS
kneeling
1.29
caption
1.20
silhou
1.16
grinning
1.15
smiling
1.15
silhouette
1.11
flanked
1.08
handcuffed
1.00
purportedly
0.98
decap
0.98
Activations Density 0.404%