INDEX
    Explanations

    words related to negative feelings such as loathing

    instances of the word "loathe" in various forms and context

    New Auto-Interp
    Negative Logits
    rition
    -0.83
    rity
    -0.76
    pillar
    -0.75
    manship
    -0.72
    ITAL
    -0.71
    lished
    -0.71
    glass
    -0.70
     Norn
    -0.70
    TAIN
    -0.68
    race
    -0.66
    POSITIVE LOGITS
    oser
    1.09
    aves
    1.06
    veland
    1.04
    aning
    1.03
    fty
    1.03
    vers
    0.94
    aned
    0.94
    zzle
    0.93
    vel
    0.91
    zz
    0.91
    Act Density 0.015%

    No Known Activations