INDEX
Explanations
words related to financial transactions or debts
New Auto-Interp
Negative Logits
objective
-0.71
belonging
-0.64
divergence
-0.63
encyclopedia
-0.62
impulse
-0.61
neutrality
-0.60
affiliation
-0.59
aisle
-0.59
intermedi
-0.58
stimulus
-0.58
POSITIVE LOGITS
izes
1.25
ifies
1.24
uates
1.20
wrote
1.18
ounced
1.16
rolled
1.16
itates
1.15
ited
1.14
elled
1.13
handled
1.10
Activations Density 2.221%