INDEX
    Explanations

    instances of knowledge and awareness in various contexts

    New Auto-Interp
    Negative Logits
     Efforts
    -0.66
    dhist
    -0.64
     mengel
    -0.64
     Gill
    -0.63
     Tos
    -0.62
     Barbour
    -0.60
     efforts
    -0.58
    popd
    -0.58
    TEntity
    -0.58
     Larsson
    -0.56
    POSITIVE LOGITS
     knows
    1.70
     know
    1.70
    know
    1.69
    Know
    1.68
     Know
    1.64
    knows
    1.61
     KNOW
    1.59
    KNOW
    1.58
     Knows
    1.58
    knew
    1.50
    Act Density 0.125%

    No Known Activations