INDEX
Explanations
references to cartoons
references to cartoons
New Auto-Interp
Negative Logits
changes
-0.80
rity
-0.73
ivation
-0.71
ulia
-0.69
Availability
-0.69
Priv
-0.68
work
-0.67
locks
-0.66
Recovery
-0.65
Skill
-0.64
POSITIVE LOGITS
cartoon
3.71
cartoons
3.15
Cartoon
2.29
caric
2.05
caricature
1.91
satir
1.64
comic
1.57
comics
1.44
satirical
1.42
animated
1.32
Activations Density 0.021%