INDEX
    Explanations

    terms related to romance and romantic relationships

    New Auto-Interp
    Negative Logits
     Romance
    -0.22
     ROM
    -0.20
    ROM
    -0.18
     Romans
    -0.17
     Romantic
    -0.17
     romance
    -0.17
    reich
    -0.16
    edback
    -0.16
    manship
    -0.16
    rom
    -0.16
    POSITIVE LOGITS
    ized
    0.31
    izing
    0.28
    ism
    0.28
    ised
    0.24
    ize
    0.23
    izes
    0.22
    izer
    0.21
    ising
    0.21
     notions
    0.21
    ise
    0.20
    Act Density 0.015%

    No Known Activations