INDEX
    Explanations

    mentions of loved ones and relationships in various contexts

    New Auto-Interp
    Negative Logits
    é¾
    -0.78
    ERO
    -0.72
    ulated
    -0.70
    ulhu
    -0.69
    illin
    -0.64
    erity
    -0.64
    ulation
    -0.63
    amphetamine
    -0.63
    IDER
    -0.63
    ipl
    -0.62
    POSITIVE LOGITS
     ones
    1.06
     dearly
    0.85
     pets
    0.83
    ometown
    0.82
     uncond
    0.79
     spouse
    0.78
     nephew
    0.78
     Ones
    0.78
     memories
    0.77
     loved
    0.77
    Act Density 0.044%

    No Known Activations