INDEX
Explanations
words related to negative evaluations or criticism
words related to preference and taste
New Auto-Interp
Negative Logits
mers
-0.72
©¶æ
-0.68
ħĭ
-0.65
monds
-0.64
©¶æ¥µ
-0.63
assets
-0.62
Haas
-0.62
epid
-0.62
crawl
-0.61
ejac
-0.61
POSITIVE LOGITS
itism
1.20
ite
1.01
ited
0.96
naire
0.96
ITE
0.91
avour
0.89
eous
0.83
ably
0.83
Favor
0.82
able
0.80
Activations Density 0.022%