INDEX
    Explanations

    negative descriptors or criticisms

    instances of the word "silly."

    New Auto-Interp
    Negative Logits
    yer
    -0.95
    Reviewed
    -0.92
    rigan
    -0.84
    ept
    -0.81
    ainer
    -0.81
    ioch
    -0.81
    ainers
    -0.79
    amen
    -0.78
    rien
    -0.76
    lain
    -0.76
    POSITIVE LOGITS
     silly
    1.06
     nonsense
    0.91
     aside
    0.87
     Ples
    0.84
     prank
    0.81
    ness
    0.81
     childish
    0.79
     Haram
    0.78
     Pry
    0.76
    ishly
    0.76
    Act Density 0.018%

    No Known Activations