INDEX
    Explanations

    phrases describing deprivation or suffering

    negations, particularly forms of the word "not."

    New Auto-Interp
    Negative Logits
     Reloaded
    -0.78
    ĪĴ
    -0.70
    ħĭ
    -0.68
     hemor
    -0.63
     Poster
    -0.62
     behavi
    -0.61
     Passenger
    -0.61
    Ĥİ
    -0.60
     Penguin
    -0.60
    çĦ
    -0.59
    POSITIVE LOGITS
    ween
    1.02
    weet
    0.92
    unes
    0.91
    reprene
    0.89
    une
    0.86
    ract
    0.83
    achment
    0.83
    urb
    0.82
    aper
    0.80
    obi
    0.79
    Act Density 0.099%

    No Known Activations