INDEX
Explanations
parodies or spoofed content
terms related to parody, spoof, and satire
New Auto-Interp
Negative Logits
Dynamics
-0.87
erto
-0.82
frames
-0.75
omen
-0.73
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
-0.71
aining
-0.69
inventoryQuantity
-0.68
cryst
-0.68
acia
-0.67
士
-0.67
POSITIVE LOGITS
spoof
1.14
netflix
1.06
parody
1.04
mockery
1.03
mocking
1.02
caric
0.90
satir
0.90
caricature
0.89
imperson
0.85
joking
0.84
Activations Density 0.044%