INDEX
    Explanations

    mentions of cartoon characters or TV shows

    New Auto-Interp
    Negative Logits
    HI
    -0.71
    govern
    -0.69
     sclerosis
    -0.69
    acia
    -0.69
    forces
    -0.68
    alez
    -0.67
    FUL
    -0.66
    ttp
    -0.64
    utherford
    -0.64
    vae
    -0.64
    POSITIVE LOGITS
     cartoons
    1.08
    ishly
    1.05
     cartoon
    0.96
     frog
    0.91
     caric
    0.89
     sketches
    0.87
    ists
    0.85
     Cartoon
    0.84
     depictions
    0.84
    ist
    0.83
    Act Density 0.018%

    No Known Activations