INDEX
Explanations
power dynamics, political maneuvers, and social issues in text
phrases that indicate manipulation and exploitation for personal or political gain
New Auto-Interp
Negative Logits
çīĪ
-0.68
tackle
-0.67
Explore
-0.63
webkit
-0.63
Methodist
-0.63
Crusher
-0.63
"}],"
-0.62
surveyed
-0.61
Alloy
-0.61
Loving
-0.61
POSITIVE LOGITS
agendas
0.97
pretext
0.97
blackmail
0.95
profit
0.95
agenda
0.93
nefarious
0.88
revenge
0.88
advantage
0.87
predetermined
0.87
profits
0.85
Activations Density 0.735%