INDEX
Explanations
phrases related to political events and government policies
New Auto-Interp
Negative Logits
nels
-0.80
ording
-0.78
xual
-0.73
object
-0.72
ij士
-0.70
flies
-0.70
Kind
-0.69
event
-0.69
!/
-0.68
JECT
-0.68
POSITIVE LOGITS
largest
1.18
poorest
1.05
wealthiest
1.03
finest
1.00
tallest
1.00
premier
1.00
richest
0.98
longest
0.97
biggest
0.97
newest
0.95
Activations Density 0.687%