INDEX
Explanations
references to cartoons and animated media
New Auto-Interp
Negative Logits
opio
-0.70
acia
-0.66
forces
-0.66
sclerosis
-0.64
govern
-0.64
vae
-0.63
pta
-0.62
CI
-0.60
kept
-0.59
20439
-0.58
POSITIVE LOGITS
ishly
1.19
cartoons
1.03
ish
1.00
ists
1.00
ist
0.97
oons
0.90
cartoon
0.90
caric
0.89
ical
0.89
istically
0.87
Activations Density 0.015%