INDEX
Explanations
words related to political and corporate dynamics or relationships
New Auto-Interp
Negative Logits
decomp
-0.72
JPEG
-0.68
gib
-0.65
silhou
-0.63
"+
-0.62
+++
-0.61
Kiw
-0.60
Axel
-0.58
veget
-0.57
hurd
-0.56
POSITIVE LOGITS
s
1.29
ses
1.05
ski
1.03
ship
1.00
tal
1.00
tis
0.98
ese
0.93
tarian
0.93
t
0.93
lis
0.92
Activations Density 0.219%