INDEX
Explanations
words related to politics and societal issues
themes related to political self-interest and manipulation
New Auto-Interp
Negative Logits
assium
-0.58
interrupted
-0.58
ortium
-0.57
mosqu
-0.56
staking
-0.56
inguished
-0.55
marked
-0.55
availability
-0.54
etheless
-0.54
requisite
-0.53
POSITIVE LOGITS
truths
0.81
blindly
0.74
immoral
0.73
stupid
0.72
scapego
0.72
paycheck
0.70
oneself
0.70
ignorant
0.69
morals
0.68
righteousness
0.68
Activations Density 1.252%