INDEX
    Explanations

    words related to negative sentiments or feelings, particularly strong dislike or hatred

    instances of the term "loathe" or related expressions of strong dislike

    New Auto-Interp
    Negative Logits
     Norn
    -0.84
    rity
    -0.81
    */(
    -0.76
    hower
    -0.76
    glass
    -0.75
    rition
    -0.72
    sonian
    -0.71
    lished
    -0.69
    ITAL
    -0.67
    manship
    -0.67
    POSITIVE LOGITS
    oser
    1.04
    aning
    1.02
    aned
    1.01
    veland
    1.00
    aves
    0.95
    fty
    0.92
    zzle
    0.92
    ppy
    0.91
    ven
    0.90
    obb
    0.90
    Act Density 0.009%

    No Known Activations