INDEX
Explanations
words related to promotion and advancement
New Auto-Interp
Negative Logits
most
-0.66
-0.64
ara
-0.62
y
-0.60
two
-0.60
AssemblyCulture
-0.58
Dieter
-0.58
aDecoder
-0.58
aha
-0.57
녁
-0.57
POSITIVE LOGITS
promotion
2.08
promoted
1.91
Promotion
1.88
promotions
1.87
Promote
1.87
Promoted
1.83
promoting
1.81
promote
1.78
Promote
1.77
promotion
1.76
Activations Density 0.076%