INDEX
    Explanations

    phrases expressing strong personal preferences or identities

    New Auto-Interp
    Negative Logits
     menacing
    -0.80
     unrecogn
    -0.72
     majesty
    -0.72
     overshadow
    -0.71
     incrim
    -0.70
     unheard
    -0.70
     assassinate
    -0.70
     menace
    -0.69
     virtues
    -0.68
     believable
    -0.66
    POSITIVE LOGITS
    math
    0.78
    consumer
    0.74
     OCD
    0.73
     betting
    0.69
    intend
    0.68
     sucker
    0.68
     fan
    0.66
     avid
    0.66
    chool
    0.65
    price
    0.65
    Act Density 0.488%

    No Known Activations