INDEX
    Explanations

    references to a single male individual at various points in the text

    New Auto-Interp
    Negative Logits
    è¥
    -0.15
    imus
    -0.15
    aron
    -0.14
    eam
    -0.14
    taire
    -0.14
    angelo
    -0.14
    velle
    -0.14
    .sz
    -0.14
    ago
    -0.14
    ways
    -0.14
    POSITIVE LOGITS
    /her
    0.19
    inerary
    0.18
    /th
    0.18
    /she
    0.16
    /we
    0.16
    kek
    0.15
    atically
    0.15
    å§
    0.14
    iner
    0.14
     ali
    0.14
    Act Density 0.051%

    No Known Activations