INDEX
Explanations
words related to deception and manipulation, such as publicity stunts, hoaxes, fabrications, and conspiracies
terms related to deceptive or misleading actions
New Auto-Interp
Negative Logits
izont
-0.94
anamo
-0.75
upt
-0.74
enz
-0.73
liam
-0.72
acers
-0.70
arton
-0.70
pict
-0.70
»Ĵ
-0.69
accompan
-0.67
POSITIVE LOGITS
concoct
1.04
perpetrated
0.94
ploy
0.93
shenan
0.90
rather
0.88
aimed
0.86
meant
0.85
gimmick
0.83
invented
0.82
diversion
0.82
Activations Density 0.355%