INDEX
Explanations
phrases that highlight hypocrisy in social and political discourse
New Auto-Interp
Negative Logits
conde
-0.15
Seg
-0.15
ÑĢави
-0.15
igar
-0.15
azzi
-0.14
.LOG
-0.14
Lund
-0.14
ãĥ¼ãĤº
-0.14
rone
-0.14
ogg
-0.13
POSITIVE LOGITS
isz
0.16
algo
0.14
udder
0.14
ê³Ħ
0.14
едÑĮ
0.14
Dangerous
0.14
aida
0.14
-hash
0.14
ixin
0.13
¨
0.13
Activations Density 0.157%