INDEX
    Explanations

    negative words or phrases indicating denial or absence

    New Auto-Interp
    Negative Logits
    nya
    -0.16
    noon
    -0.16
    patrick
    -0.16
     Various
    -0.15
    ucci
    -0.15
    rape
    -0.15
    mente
    -0.15
    rick
    -0.15
    empo
    -0.15
    nek
    -0.14
    POSITIVE LOGITS
    thin
    0.35
    -one
    0.35
     longer
    0.34
    xious
    0.33
    things
    0.30
    isy
    0.28
    one
    0.27
    ël
    0.26
    pe
    0.26
    BODY
    0.25
    Act Density 0.100%

    No Known Activations