INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    amad
    -0.09
    chy
    -0.08
     solely
    -0.08
     queer
    -0.07
    …”
    -0.07
    ..."
    -0.07
    ...”
    -0.07
     crip
    -0.07
    бо
    -0.07
    .."
    -0.07
    POSITIVE LOGITS
     implicit
    0.10
    Implicit
    0.10
    Equation
    0.10
     equival
    0.09
    implicit
    0.09
     Eq
    0.09
     anyway
    0.09
     indicated
    0.08
     discussed
    0.08
    Eq
    0.08
    Act Density 0.014%

    No Known Activations