INDEX
Explanations
phrases related to manipulation and deception
tricked or fooled
New Auto-Interp
Negative Logits
rele
-0.43
bares
-0.40
TaskId
-0.39
(©
-0.39
OrderService
-0.37
UAGES
-0.37
gangs
-0.36
Hands
-0.36
Bombs
-0.36
الحره
-0.35
POSITIVE LOGITS
believing
0.64
fooled
0.60
SharedCtor
0.58
invokeLater
0.57
croy
0.56
deceived
0.56
ImageContext
0.55
geloof
0.54
tanleria
0.53
croire
0.52
Activations Density 0.097%