INDEX
    Explanations

    words related to romantic sentiments or relationships

    references to romantic themes and relationships

    New Auto-Interp
    Negative Logits
    upon
    -0.94
    avis
    -0.86
    ldon
    -0.81
    Downloadha
    -0.77
    redd
    -0.73
    WIN
    -0.73
    irin
    -0.73
    ulhu
    -0.72
    uckles
    -0.72
    aston
    -0.72
    POSITIVE LOGITS
     romantic
    0.98
    ized
    0.88
    istic
    0.86
     monog
    0.86
    istically
    0.82
    ties
    0.79
    antically
    0.77
    ization
    0.77
    izing
    0.75
     romance
    0.71
    Act Density 0.007%

    No Known Activations