INDEX
    Explanations

    terms related to romance and romantic relationships

    New Auto-Interp
    Negative Logits
    936
    -0.16
    auty
    -0.16
    inson
    -0.15
    manship
    -0.15
    wards
    -0.15
    isters
    -0.15
    rogen
    -0.15
    alet
    -0.15
    onders
    -0.14
    ulty
    -0.14
    POSITIVE LOGITS
    ized
    0.22
    izing
    0.20
    ised
    0.19
    atic
    0.16
    ism
    0.16
    ting
    0.16
    ize
    0.15
    ising
    0.15
    ous
    0.15
    ization
    0.15
    Act Density 0.014%

    No Known Activations