INDEX
    Explanations

    words related to positivity or desirability

    words expressing positive or favorable evaluations and opinions

    New Auto-Interp
    Negative Logits
    hid
    -0.88
    grave
    -0.85
    bus
    -0.85
    lang
    -0.82
    liam
    -0.79
    hod
    -0.78
    driver
    -0.75
    jack
    -0.74
    iq
    -0.73
    drivers
    -0.72
    POSITIVE LOGITS
     favorable
    1.40
    avorable
    1.31
     favourable
    1.20
     unfavorable
    1.12
     matchups
    0.97
     favorably
    0.93
     agre
    0.88
     advantageous
    0.86
     favors
    0.85
     ratings
    0.81
    Act Density 0.007%

    No Known Activations