INDEX
    Explanations

    instances of manipulation or social dynamics in relationships

    New Auto-Interp
    Negative Logits
    illard
    -0.15
    astle
    -0.15
    hei
    -0.15
    loat
    -0.14
    cken
    -0.14
    illet
    -0.14
    imas
    -0.13
    inker
    -0.13
    gabe
    -0.13
    lk
    -0.13
    POSITIVE LOGITS
     Nash
    0.14
     conc
    0.14
     Turnbull
    0.14
     clipping
    0.14
    odies
    0.13
     fl
    0.13
     clip
    0.13
     clipped
    0.13
       
    0.13
     respectively
    0.13
    Act Density 0.011%

    No Known Activations