INDEX
Explanations
instances of hypocrisy in political and social contexts
New Auto-Interp
Negative Logits
rouw
-0.15
_PK
-0.14
olla
-0.14
andro
-0.14
onomy
-0.13
otas
-0.13
ãĥ¼ãĥ©
-0.13
).__
-0.13
OrFail
-0.13
Riders
-0.13
POSITIVE LOGITS
whereas
0.29
despite
0.27
à¤ľà¤¬à¤ķ
0.24
while
0.23
Whereas
0.22
while
0.20
ØŃاÙĦÛĮ
0.20
ibel
0.20
whilst
0.19
without
0.18
Activations Density 0.260%