INDEX
Explanations
mention of movies, actors, and directors
references to comedy and comedic elements
New Auto-Interp
Negative Logits
ribute
-0.77
orem
-0.74
fortune
-0.74
animous
-0.73
accompan
-0.73
drawn
-0.72
ributed
-0.72
inness
-0.71
inguished
-0.71
scrib
-0.70
POSITIVE LOGITS
edy
1.08
Reloaded
0.76
Unleashed
0.68
Zed
0.67
kW
0.67
Enterprises
0.66
bda
0.65
Bang
0.64
tsky
0.64
deen
0.63
Activations Density 0.009%