INDEX
    Explanations

    specific adjectives that describe actions or characteristics, such as "sparing", "terrifying", "vicious", "exact", and "leisure"

    New Auto-Interp
    Negative Logits
    anders
    -0.79
    APH
    -0.66
    axy
    -0.66
    REL
    -0.65
    DonaldTrump
    -0.64
    ploma
    -0.63
    aden
    -0.61
    ARM
    -0.61
    ODUCT
    -0.61
    iltr
    -0.60
    POSITIVE LOGITS
    ly
    2.93
    LY
    1.84
    lys
    1.42
    edly
    1.31
    liness
    1.31
    lies
    1.25
    fully
    1.21
    ity
    1.16
    ously
    1.15
    lly
    1.15
    Act Density 1.945%

    No Known Activations