INDEX
    Explanations

    phrases or questions involving the concept of knowledge or understanding

    New Auto-Interp
    Negative Logits
    ceptions
    -0.71
    UME
    -0.69
    iculture
    -0.69
    agonists
    -0.67
    idered
    -0.65
    odder
    -0.64
     ........
    -0.62
    holder
    -0.61
    izu
    -0.61
    Reader
    -0.61
    POSITIVE LOGITS
    beit
    0.80
    HCR
    0.79
     much
    0.78
     MUCH
    0.73
    itzer
    0.73
    ells
    0.69
    soever
    0.69
    ling
    0.69
    ever
    0.68
    much
    0.67
    Act Density 0.068%

    No Known Activations