INDEX
Explanations
words related to deception or trickery
terminology related to deception and being misled
New Auto-Interp
Negative Logits
area
-0.68
oran
-0.65
Occup
-0.64
mun
-0.61
Interstitial
-0.60
foreseen
-0.59
capacity
-0.59
empl
-0.57
Pain
-0.55
grievances
-0.55
POSITIVE LOGITS
deceive
1.04
deceived
1.00
fooled
0.99
ingly
0.96
gull
0.91
eering
0.85
ulent
0.85
tricked
0.84
confuse
0.81
unwitting
0.81
Activations Density 0.033%