INDEX
Explanations
words related to deception or manipulation
references to deceptive practices or tactics used in political or social contexts
New Auto-Interp
Negative Logits
kens
-0.71
Horizons
-0.70
Wander
-0.66
Parables
-0.65
CARE
-0.63
anse
-0.60
aging
-0.60
accompan
-0.60
experiences
-0.60
inguished
-0.59
POSITIVE LOGITS
ploy
1.10
perpetrated
1.10
blackmail
1.09
against
1.03
tactic
1.02
concoct
1.02
orchestrated
1.01
pretext
1.00
extortion
1.00
tactics
0.98
Activations Density 0.391%