INDEX
Explanations
phrases related to theft or stealing
instances of the word "steal" and its variations
New Auto-Interp
Negative Logits
ichick
-0.79
hedon
-0.75
ŀ
-0.74
SPA
-0.73
present
-0.72
Unsure
-0.71
foreseen
-0.69
rition
-0.69
olver
-0.69
etheless
-0.68
POSITIVE LOGITS
glances
1.00
goods
0.85
from
0.85
credit
0.84
funds
0.82
away
0.82
stolen
0.82
valuable
0.80
money
0.76
priceless
0.74
Activations Density 0.075%