INDEX
    Explanations

    phrases that argue or advocate for a specific point of view or position

    New Auto-Interp
    Negative Logits
     Seym
    -0.70
    orks
    -0.66
    liction
    -0.65
    elta
    -0.64
    attery
    -0.63
    ummer
    -0.62
    BW
    -0.62
    leted
    -0.61
    kered
    -0.60
    onder
    -0.60
    POSITIVE LOGITS
     against
    1.06
     convinc
    0.93
    against
    0.88
    Against
    0.86
    cases
    0.77
     Keen
    0.75
     Against
    0.75
     loudly
    0.74
     persu
    0.73
     why
    0.73
    Act Density 0.021%

    No Known Activations