INDEX
    Explanations

    statements where someone knows something

    references to knowledge and awareness

    New Auto-Interp
    Negative Logits
     pending
    -0.64
     dramatic
    -0.60
     critical
    -0.59
     negative
    -0.59
     additional
    -0.59
     desk
    -0.56
     incorporation
    -0.56
     due
    -0.55
     optional
    -0.55
    coming
    -0.55
    POSITIVE LOGITS
     knows
    3.34
     understands
    2.20
     knew
    2.14
     remembers
    1.81
     know
    1.77
     learns
    1.76
     realizes
    1.73
    know
    1.66
     thinks
    1.64
     KNOW
    1.63
    Act Density 0.016%

    No Known Activations