INDEX
    Explanations

    the word "hate" at various intensities

    New Auto-Interp
    Negative Logits
    ItemImage
    -0.84
    aqu
    -0.82
    aunder
    -0.81
    igmatic
    -0.80
    istics
    -0.76
    uggest
    -0.75
    aver
    -0.74
    DragonMagazine
    -0.74
    enture
    -0.73
    arta
    -0.73
    POSITIVE LOGITS
    fully
    1.11
     hated
    0.93
     hate
    0.87
    FUL
    0.80
     hates
    0.79
     wasting
    0.79
     Hate
    0.78
    76561
    0.75
    hate
    0.75
    ful
    0.75
    Act Density 0.021%

    No Known Activations