INDEX
    Explanations

    duplicated letters in words

    words associated with humor or silliness

    New Auto-Interp
    Negative Logits
    代
    -0.85
     misunder
    -0.76
     Luthor
    -0.71
    DonaldTrump
    -0.69
    imir
    -0.67
    ewski
    -0.67
    nikov
    -0.66
    itates
    -0.65
     Integrity
    -0.64
    auri
    -0.62
    POSITIVE LOGITS
    gey
    1.24
    zing
    1.20
    ze
    1.17
    gee
    1.15
    zer
    1.12
    zy
    1.09
    zers
    1.06
    zeb
    1.06
    zie
    1.04
    leans
    1.02
    Act Density 0.045%

    No Known Activations