INDEX
    Explanations

    instances of negotiation and dialogue

    New Auto-Interp
    Negative Logits
    \e
    -0.07
    iscrim
    -0.07
     pres
    -0.06
    pector
    -0.06
    ucc
    -0.06
    iy
    -0.06
    imps
    -0.06
    idian
    -0.06
    ight
    -0.06
     potentially
    -0.06
    POSITIVE LOGITS
    æķħ
    0.10
     pretended
    0.09
     fe
    0.08
    Fake
    0.08
     Fake
    0.08
     fake
    0.08
    pret
    0.08
     æķħ
    0.08
    åģĩ
    0.07
    åζéĢł
    0.07
    Act Density 0.056%

    No Known Activations