INDEX
    Explanations

    references to romantic partners

    references to romantic partners

    New Auto-Interp
    Negative Logits
    cale
    -0.80
    aston
    -0.79
    anism
    -0.74
    ihil
    -0.71
    ouses
    -0.69
    iard
    -0.66
     ceilings
    -0.64
    aji
    -0.64
    thur
    -0.63
    Paris
    -0.63
    POSITIVE LOGITS
     partner
    1.17
     partners
    1.00
     Partner
    0.85
    ãĤ´ãĥ³
    0.76
     competitor
    0.65
    hood
    0.64
     colleague
    0.64
    loo
    0.63
     Karin
    0.63
    itute
    0.62
    Act Density 0.012%

    No Known Activations