INDEX
    Explanations

    verbs and their various forms

    New Auto-Interp
    Negative Logits
    iott
    -0.80
    ICAN
    -0.77
    accompanied
    -0.76
    ITIES
    -0.69
    mouth
    -0.69
    icult
    -0.64
     indef
    -0.62
    sterdam
    -0.60
    CHAT
    -0.60
    ipolar
    -0.60
    POSITIVE LOGITS
    dule
    1.09
    lde
    0.90
    xual
    0.89
    Ń·
    0.84
    ption
    0.78
    lder
    0.76
    roo
    0.76
    lled
    0.75
    pherd
    0.72
    hett
    0.72
    Act Density 0.008%

    No Known Activations