INDEX
    Explanations

    phrases related to social interaction and communication

    phrases related to controversial topics and opinions

    New Auto-Interp
    Negative Logits
    antine
    -0.80
    isoft
    -0.72
    iencies
    -0.70
    ongo
    -0.70
    MRI
    -0.66
    avorite
    -0.65
    abase
    -0.65
    wald
    -0.64
    unker
    -0.63
    brance
    -0.62
    POSITIVE LOGITS
     "@
    1.33
     "'
    1.23
     "<
    1.22
     "#
    1.22
     "...
    1.20
     "â̦
    1.20
     "{
    1.13
     "(
    1.13
     "%
    1.11
     "-
    1.10
    Act Density 0.783%

    No Known Activations