INDEX
    Explanations

    mentions of romantic or personal relationships

    New Auto-Interp
    Negative Logits
    iminal
    -0.15
    emean
    -0.15
    obot
    -0.15
    jerne
    -0.14
    370
    -0.14
    rava
    -0.13
    รà¸ĵ
    -0.13
    .ct
    -0.13
    rgan
    -0.13
    eview
    -0.13
    POSITIVE LOGITS
     dating
    0.38
     relationship
    0.34
     relationships
    0.31
     Dating
    0.30
     romance
    0.30
     datings
    0.28
     Relationship
    0.28
     romant
    0.27
     hook
    0.27
    dating
    0.27
    Act Density 0.077%

    No Known Activations