INDEX
    Explanations

    terms related to negative emotions or displeasure

    instances of the word "loathe" and its variations

    New Auto-Interp
    Negative Logits
    rition
    -0.80
    glass
    -0.78
    pillar
    -0.77
    rity
    -0.76
    manship
    -0.75
     Norn
    -0.74
    hower
    -0.73
    sonian
    -0.70
    ITAL
    -0.69
    lished
    -0.68
    POSITIVE LOGITS
    aned
    1.05
    aning
    1.04
    oser
    1.02
    veland
    0.99
    aves
    0.99
    vers
    0.89
    ishly
    0.89
    ppy
    0.88
    ven
    0.88
    igh
    0.87
    Act Density 0.010%

    No Known Activations