INDEX
    Explanations

    phrases related to knowledge or lack thereof

    phrases indicating a lack of knowledge or certainty

    New Auto-Interp
    Negative Logits
    hement
    -0.84
     sidx
    -0.80
    erate
    -0.78
    igham
    -0.73
    odder
    -0.72
    uably
    -0.71
    ramid
    -0.70
    rative
    -0.69
    vertisement
    -0.69
    nir
    -0.68
    POSITIVE LOGITS
     whereabouts
    0.78
     firsthand
    0.76
     beforehand
    0.73
     intimately
    0.73
     secret
    0.73
    CHAT
    0.72
     secrets
    0.68
    æĿ
    0.67
    ledged
    0.67
    Orig
    0.63
    Act Density 0.254%

    No Known Activations