INDEX
Explanations
phrases related to increasing, boosting, or gaining an advantage
tactics related to manipulation and deception for political or media gain
New Auto-Interp
Negative Logits
zens
-0.69
umerable
-0.66
Variant
-0.64
surveyed
-0.63
mapped
-0.62
container
-0.60
Same
-0.60
consulted
-0.59
webkit
-0.59
Tuc
-0.59
POSITIVE LOGITS
publicity
0.87
gull
0.86
retribution
0.82
eventual
0.78
unsuspecting
0.76
favourable
0.76
perceived
0.75
appease
0.75
revenge
0.74
incrim
0.74
Activations Density 0.748%