INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _war
    -0.07
    -badge
    -0.06
     čer
    -0.06
     об
    -0.06
     haired
    -0.06
     squirrel
    -0.06
     أي
    -0.06
    -0.06
     Manit
    -0.06
     дві
    -0.06
    POSITIVE LOGITS
    discussion
    0.07
     Counsel
    0.07
    0.06
    scribe
    0.06
     dispro
    0.06
    ignore
    0.06
     condemning
    0.06
    (screen
    0.06
     idea
    0.06
     LGBTQ
    0.06
    Act Density 0.002%

    No Known Activations