INDEX
Explanations
phrases related to economics and politics
topics related to social and political issues
New Auto-Interp
Negative Logits
IPM
-0.74
ovy
-0.65
!:
-0.64
uala
-0.63
rising
-0.63
etheless
-0.62
GGGG
-0.61
Seym
-0.60
ivating
-0.60
cause
-0.59
POSITIVE LOGITS
lacks
1.13
exists
1.10
resides
1.10
hadn
1.09
hasn
1.06
existed
1.03
relies
1.03
cannot
1.02
dominates
0.99
isn
0.98
Activations Density 0.542%