INDEX
Explanations
references to theft or stealing
New Auto-Interp
Negative Logits
397
-0.16
ForResult
-0.15
_requires
-0.15
lope
-0.15
rophe
-0.14
idue
-0.14
suming
-0.14
Roe
-0.14
quisite
-0.14
ups
-0.14
POSITIVE LOGITS
away
0.24
stealing
0.20
thunder
0.20
khá»ıi
0.18
ambi
0.18
ÙħÙĨÙĩا
0.17
stole
0.17
identities
0.17
Away
0.16
thief
0.16
Activations Density 0.049%