INDEX
Explanations
references to deception or pretense in political and social contexts
New Auto-Interp
Negative Logits
anki
-0.16
ียà¸Ķ
-0.16
è®
-0.15
chner
-0.14
zdrav
-0.14
ynos
-0.14
anke
-0.14
atron
-0.14
atu
-0.14
aptive
-0.14
POSITIVE LOGITS
somehow
0.20
superior
0.18
representing
0.17
expertise
0.17
experts
0.17
sophistication
0.17
represent
0.15
progress
0.15
authority
0.15
victim
0.14
Activations Density 0.149%