INDEX
Explanations
mentions of cartoon characters or TV shows
mentions of cartoons
New Auto-Interp
Negative Logits
HI
-0.71
govern
-0.69
sclerosis
-0.69
acia
-0.69
forces
-0.68
alez
-0.67
FUL
-0.66
ttp
-0.64
utherford
-0.64
vae
-0.64
POSITIVE LOGITS
cartoons
1.08
ishly
1.05
cartoon
0.96
frog
0.91
caric
0.89
sketches
0.87
ists
0.85
Cartoon
0.84
depictions
0.84
ist
0.83
Activations Density 0.018%