INDEX
    Explanations

    references to mothers or maternal figures

    New Auto-Interp
    Negative Logits
    wright
    -0.16
    egin
    -0.14
    owy
    -0.14
    -linear
    -0.14
    rd
    -0.14
    iders
    -0.14
    ird
    -0.14
    ãĥĥãĥĪ
    -0.14
    Mocks
    -0.13
    strup
    -0.13
    POSITIVE LOGITS
    hood
    0.20
    -child
    0.18
    gos
    0.17
    itespace
    0.17
    gom
    0.15
    aight
    0.15
    eros
    0.15
    ÑĤаж
    0.15
    REN
    0.15
    SHIP
    0.15
    Act Density 0.041%

    No Known Activations