INDEX
    Explanations

    phrases indicating knowledge or lack of knowledge in a subject

    phrases indicating a lack of knowledge or understanding

    New Auto-Interp
    Negative Logits
    ramid
    -0.80
    hement
    -0.75
    odder
    -0.74
    uably
    -0.74
     sidx
    -0.73
    erate
    -0.69
    raught
    -0.69
    nir
    -0.69
     Featured
    -0.69
     rall
    -0.68
    POSITIVE LOGITS
     firsthand
    0.78
     whereabouts
    0.77
     beforehand
    0.71
     intimately
    0.69
    æĿ
    0.66
     secret
    0.65
    ä½
    0.65
    Orig
    0.63
     basics
    0.63
    LAB
    0.62
    Act Density 0.229%

    No Known Activations