INDEX
Explanations
phrases related to challenging or reviewing actions by authority figures
instances of the word "by" indicating actions or authorship
New Auto-Interp
Negative Logits
resil
-0.79
ãĤ¦ãĤ¹
-0.78
"$:/
-0.75
redits
-0.70
stakes
-0.68
tenance
-0.67
adal
-0.66
meat
-0.65
URN
-0.64
qqa
-0.64
POSITIVE LOGITS
virtue
1.03
laws
0.96
products
0.95
gone
0.85
product
0.78
omission
0.77
proxy
0.72
policymakers
0.69
whistleblowers
0.69
politicians
0.69
Activations Density 0.122%