INDEX
    Explanations

    expressions of defiance or assertiveness

    New Auto-Interp
    Negative Logits
     magis
    -1.47
     fatis
    -1.43
     hcm
    -1.35
     alip
    -1.35
     territo
    -1.34
     susun
    -1.32
     paff
    -1.30
     umo
    -1.30
     levis
    -1.30
     aen
    -1.28
    POSITIVE LOGITS
     never
    0.74
     don
    0.73
     am
    0.72
     cannot
    0.69
     want
    0.69
     prefer
    0.66
     wanted
    0.65
     know
    0.65
     didn
    0.65
     hate
    0.64
    Act Density 0.328%

    No Known Activations