INDEX
Explanations
words related to negative actions or qualities
New Auto-Interp
Negative Logits
è£ħ
-0.80
aterial
-0.79
asio
-0.77
socket
-0.75
çīĪ
-0.75
ainted
-0.74
marked
-0.73
emis
-0.72
ector
-0.72
orthy
-0.71
POSITIVE LOGITS
disregard
1.32
pursuit
1.28
ness
1.20
antics
1.20
abandon
1.13
grin
1.11
arrogance
1.08
indifference
1.08
outburst
1.06
behavior
1.06
Activations Density 0.187%