INDEX
Explanations
instances of satire in various contexts, particularly related to cultural commentary
New Auto-Interp
Negative Logits
quier
-0.19
edly
-0.17
_PATCH
-0.16
acl
-0.15
ftar
-0.15
slt
-0.15
že
-0.15
andles
-0.15
adden
-0.15
laces
-0.15
POSITIVE LOGITS
uration
0.30
suma
0.30
irical
0.30
anic
0.29
ellite
0.28
ellites
0.25
elite
0.24
iation
0.24
URATION
0.22
ires
0.22
Activations Density 0.008%