INDEX
    Explanations

    hate speech or racist language.

    romantic relationship terms

    Dating and relationships

    New Auto-Interp
    Negative Logits
     виправивши
    -0.84
    RetentionPolicy
    -0.78
    -0.73
    ValueStyle
    -0.71
     tartalomajánló
    -0.69
     дописавши
    -0.68
     bezeichneter
    -0.67
    fromnode
    -0.65
     RouterModule
    -0.64
    XmlAccessType
    -0.63
    POSITIVE LOGITS
     marry
    1.03
     slept
    0.96
     dating
    0.96
     date
    0.96
     sleep
    0.95
     dated
    0.94
     marrying
    0.93
     Date
    0.88
     sleeps
    0.87
     sleeping
    0.86
    Act Density 2.555%

    No Known Activations