INDEX
Explanations
expressions of praise or positive evaluation
New Auto-Interp
Negative Logits
itr
-0.16
thouse
-0.15
afil
-0.15
yet
-0.14
ngen
-0.14
zap
-0.14
dech
-0.14
certo
-0.14
belli
-0.14
Goodman
-0.14
POSITIVE LOGITS
s
0.38
-grand
0.37
deal
0.32
sword
0.29
deals
0.28
-value
0.26
dane
0.24
-looking
0.24
seller
0.24
ful
0.23
Activations Density 0.041%