INDEX
    Explanations

    frequent references to specific names or proper nouns

    names of individuals and references to dolls

    New Auto-Interp
    Negative Logits
    rament
    -0.89
    gers
    -0.87
    AMI
    -0.82
    riors
    -0.80
    uld
    -0.74
    lar
    -0.74
    raltar
    -0.71
    rid
    -0.71
    arching
    -0.70
    lopp
    -0.69
    POSITIVE LOGITS
    ipop
    0.85
    BACK
    0.71
    yp
    0.67
    yk
    0.67
     ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
    0.66
    phia
    0.64
     skirt
    0.61
     Kers
    0.60
    oning
    0.60
     skirts
    0.60
    Act Density 0.123%

    No Known Activations