INDEX
    Explanations

    words or prefixes related to something negative or problematic

    terms related to unsustainable practices or concepts

    New Auto-Interp
    Negative Logits
    SHIP
    -0.72
    tsky
    -0.71
     Dynamics
    -0.70
     tanks
    -0.68
     Rams
    -0.68
     briefs
    -0.67
     Nanto
    -0.67
    phrine
    -0.67
     phases
    -0.65
     Guardians
    -0.64
    POSITIVE LOGITS
    aved
    1.18
    olicited
    1.17
    avour
    1.16
    killed
    1.11
    iders
    1.10
    atisf
    1.07
    ided
    1.05
    oci
    1.03
    ident
    1.02
    rep
    1.01
    Act Density 0.015%

    No Known Activations