INDEX
    Explanations

    playful or whimsical language and imagery

    New Auto-Interp
    Negative Logits
    ormsg
    -0.19
    raci
    -0.17
    agate
    -0.17
    lients
    -0.16
    hend
    -0.15
    .bd
    -0.15
    keh
    -0.14
    Ãły
    -0.14
    born
    -0.14
    irq
    -0.14
    POSITIVE LOGITS
     Bab
    0.16
    -pop
    0.15
    pop
    0.15
     bois
    0.14
    kins
    0.14
    à¥īप
    0.14
     hops
    0.14
    ories
    0.14
     pop
    0.14
     hal
    0.14
    Act Density 0.063%

    No Known Activations