INDEX
Explanations
phrases related to deception and trickery
various forms of the word "deceive."
New Auto-Interp
Negative Logits
Yor
-0.71
Brands
-0.67
Bots
-0.64
ingen
-0.63
Targ
-0.62
Aires
-0.62
Polo
-0.62
Zamb
-0.60
Atkinson
-0.60
CHA
-0.59
POSITIVE LOGITS
ffect
1.04
ither
0.95
ptive
0.93
pt
0.92
emonic
0.92
iving
0.91
ased
0.90
fficient
0.88
ivable
0.87
astrous
0.86
Activations Density 0.094%