INDEX
    Explanations

    adjectives ending in '-able', '-ary', '-ical', '-less', '-ive', '-ier', '-ist', '-y', '-ous', and '-ed'

    adjectives and derivatives indicating characteristics or qualities

    New Auto-Interp
    Negative Logits
    atra
    -0.78
    Patch
    -0.72
    ften
    -0.71
    veland
    -0.70
    arty
    -0.69
    ixels
    -0.68
    å°Ĩ
    -0.67
    DN
    -0.66
    slave
    -0.66
    Rush
    -0.65
    POSITIVE LOGITS
     behav
    1.04
     behaviour
    0.83
     behaviours
    0.81
     propos
    0.76
     destro
    0.74
     endeavour
    0.71
     pse
    0.70
    theless
    0.69
     behavior
    0.69
     epigen
    0.67
    Act Density 0.213%

    No Known Activations