INDEX
Explanations
discussions around accountability and hypocrisy in societal and political contexts
New Auto-Interp
Negative Logits
allenge
-0.17
.ARR
-0.15
roupon
-0.14
amma
-0.14
oser
-0.14
ocz
-0.14
é¡
-0.14
ETY
-0.14
SPELL
-0.14
.shutdown
-0.14
POSITIVE LOGITS
talk
0.34
talk
0.26
claims
0.26
-talk
0.26
CLAIM
0.25
rhetoric
0.24
claimed
0.24
Talk
0.24
Talk
0.24
claim
0.24
Activations Density 0.371%