INDEX
    Explanations

    mentions of the word "her" in various contexts

    New Auto-Interp
    Negative Logits
    ra
    -0.17
    erc
    -0.17
    ers
    -0.15
    omed
    -0.15
    urn
    -0.15
    ela
    -0.15
    resse
    -0.15
    sword
    -0.15
    sd
    -0.15
    s
    -0.15
    POSITIVE LOGITS
    bst
    0.22
    itage
    0.20
    editary
    0.19
     Majesty
    0.18
    OwnProperty
    0.17
    usalem
    0.17
    vey
    0.17
    bage
    0.17
    encia
    0.16
    çĸ
    0.16
    Act Density 0.025%

    No Known Activations