INDEX
Explanations
words related to the concept of promoting or advocacy
New Auto-Interp
Negative Logits
-0.90
most
-0.74
y
-0.69
two
-0.68
ara
-0.64
not
-0.63
ir
-0.63
-0.62
晴
-0.62
A
-0.61
POSITIVE LOGITS
Promoted
1.33
Promote
1.32
promotion
1.29
Monfieur
1.28
myſelf
1.28
promotions
1.24
Promotes
1.24
purpoſe
1.23
Promote
1.23
pleaſure
1.20
Activations Density 0.067%