INDEX
Explanations
instances of hypocrisy within political and social discourse
New Auto-Interp
Negative Logits
inst
-0.17
orta
-0.16
scaleX
-0.15
/../
-0.15
pun
-0.14
elman
-0.14
μή
-0.13
uder
-0.13
orte
-0.13
flo
-0.13
POSITIVE LOGITS
oux
0.16
illum
0.15
ault
0.15
Neck
0.15
è¶Ĭ
0.14
IJ
0.14
iggs
0.14
Candle
0.14
iks
0.14
edList
0.14
Activations Density 0.174%