INDEX
Explanations
verbs related to decline or deterioration
New Auto-Interp
Negative Logits
truthful
-0.62
ortment
-0.61
sarc
-0.61
yourself
-0.60
accountable
-0.60
naming
-0.59
identification
-0.59
congrat
-0.58
objective
-0.58
examples
-0.58
POSITIVE LOGITS
uates
1.06
uated
1.02
ighed
1.01
uating
0.98
uate
0.96
iated
0.90
ues
0.90
ceed
0.89
elled
0.88
pped
0.85
Activations Density 0.047%