INDEX
    Explanations

    words that convey positivity or desirable qualities

    New Auto-Interp
    Negative Logits
    afi
    -0.16
    isel
    -0.15
    irsch
    -0.15
    egend
    -0.15
    uther
    -0.15
    udder
    -0.15
    ottenham
    -0.15
     eoq
    -0.15
    bish
    -0.15
    ieron
    -0.15
    POSITIVE LOGITS
     than
    0.28
    _than
    0.21
    than
    0.19
     Than
    0.18
     THAN
    0.18
    -than
    0.17
    est
    0.16
     Zimmer
    0.16
    ÙĤت
    0.15
    312
    0.15
    Act Density 0.171%

    No Known Activations