INDEX
    Explanations

    words related to expressing knowledge or certainty

    New Auto-Interp
    Negative Logits
    isco
    -0.84
    aukee
    -0.76
    onding
    -0.75
    issance
    -0.72
    mage
    -0.69
    gencies
    -0.67
    aez
    -0.67
    pload
    -0.67
    orthy
    -0.66
     sidx
    -0.66
    POSITIVE LOGITS
     firsthand
    1.09
     how
    0.85
     anecd
    0.82
     nothing
    0.80
     plenty
    0.78
     personally
    0.75
     what
    0.74
     exactly
    0.72
     why
    0.72
     somet
    0.69
    Act Density 0.039%

    No Known Activations