INDEX
Explanations
phrases related to political issues and events
New Auto-Interp
Negative Logits
lez
-0.85
minus
-0.78
audi
-0.75
LESS
-0.72
leness
-0.71
Reply
-0.71
osc
-0.69
romy
-0.68
eeee
-0.67
fml
-0.67
POSITIVE LOGITS
piring
1.18
pires
1.02
pects
1.02
soon
1.01
pired
0.95
piration
0.94
phy
0.93
evidenced
0.93
well
0.90
opposed
0.89
Activations Density 10.967%