INDEX
    Explanations

    words that describe negative or harmful attributes

    New Auto-Interp
    Negative Logits
     pleaſure
    -1.23
     ſta
    -1.13
     houſe
    -1.12
     Majefty
    -1.11
     Efq
    -1.11
     lyre
    -1.10
     fermés
    -1.07
    stateProvider
    -1.05
     ſtre
    -1.04
     définiti
    -1.04
    POSITIVE LOGITS
    ness
    1.30
    ous
    1.09
    IOUS
    1.01
    ious
    0.90
    acious
    0.84
    EROUS
    0.84
    icious
    0.82
    rious
    0.80
    dious
    0.77
    s
    0.77
    Act Density 0.072%

    No Known Activations