INDEX
    Explanations

    references to loved ones and emotional connections to them

    New Auto-Interp
    Negative Logits
     LOVE
    -0.18
     lover
    -0.17
     лÑİбов
    -0.17
    loo
    -0.17
    .scalablytyped
    -0.17
     love
    -0.16
     loving
    -0.16
     lovers
    -0.16
    ková
    -0.15
    love
    -0.15
    POSITIVE LOGITS
    ones
    0.28
     ones
    0.25
     Ones
    0.23
    olls
    0.19
     relative
    0.17
    ammers
    0.17
    errick
    0.15
    ONES
    0.15
     who
    0.14
     pet
    0.14
    Act Density 0.008%

    No Known Activations