INDEX
    Explanations

    phrases related to emotions and opinions

    negative contractions, particularly related to unwillingness or refusal

    New Auto-Interp
    Negative Logits
     mixed
    -0.62
     Windsor
    -0.62
     SERV
    -0.62
     soph
    -0.60
     contrasted
    -0.60
     Friends
    -0.59
     fused
    -0.59
     shack
    -0.58
     blurred
    -0.58
     partners
    -0.57
    POSITIVE LOGITS
    t
    1.09
    nt
    0.94
    else
    0.92
    ¹
    0.90
    ivably
    0.87
    ttle
    0.87
    erest
    0.87
    onna
    0.85
    swer
    0.83
    be
    0.83
    Act Density 0.078%

    No Known Activations