INDEX
    Explanations

    specific references to princesses

    references to "Princess" in various contexts

    New Auto-Interp
    Negative Logits
    sych
    -0.73
     spaced
    -0.70
    ucl
    -0.68
     smoker
    -0.67
    ulhu
    -0.65
    EFF
    -0.64
     neur
    -0.63
    lder
    -0.63
    oker
    -0.62
    pson
    -0.61
    POSITIVE LOGITS
     Leia
    1.22
     Princess
    1.09
     Bride
    1.04
     Diana
    1.01
     Celest
    0.95
     Peach
    0.95
    anova
    0.93
     princess
    0.85
     Fiona
    0.82
     Alexandra
    0.81
    Act Density 0.013%

    No Known Activations