INDEX
    Explanations

    words related to negative attributes or consequences

    negative phrases related to health or unfavorable conditions

    New Auto-Interp
    Negative Logits
     Sharing
    -0.66
     rings
    -0.63
     Shooter
    -0.62
     Nope
    -0.61
     endlessly
    -0.61
    illon
    -0.60
     Rouge
    -0.60
    £ı
    -0.59
     Noir
    -0.59
    ulhu
    -0.58
    POSITIVE LOGITS
    gotten
    1.27
    fitting
    1.15
    equipped
    1.13
    founded
    1.10
    defined
    1.10
    treatment
    1.06
    informed
    1.04
    fortune
    0.98
    intent
    0.96
    treated
    0.96
    Act Density 0.028%

    No Known Activations