INDEX
Explanations
mentions of actions related to theft or unauthorized taking
instances of the word "steal" and its variations
New Auto-Interp
Negative Logits
present
-0.74
band
-0.73
anamo
-0.71
pora
-0.70
olver
-0.70
night
-0.69
ichick
-0.67
bands
-0.65
rehens
-0.64
acerb
-0.63
POSITIVE LOGITS
glances
0.87
weed
0.82
stolen
0.77
ster
0.76
stealing
0.73
prey
0.71
sters
0.71
away
0.70
ezvous
0.69
steals
0.69
Activations Density 0.019%