INDEX
    Explanations

    expressions of knowledge and awareness

    New Auto-Interp
    Negative Logits
    esgue
    -0.78
    AccessorTable
    -0.64
    lrrrr
    -0.61
     vuitton
    -0.59
     reflections
    -0.58
    printStackTrace
    -0.57
     Reflections
    -0.57
    Prag
    -0.56
    hithe
    -0.56
    CRITICAL
    -0.56
    POSITIVE LOGITS
     know
    1.86
    know
    1.82
     knows
    1.81
    Know
    1.74
     Know
    1.69
    knows
    1.64
    KNOW
    1.60
     KNOW
    1.55
     knowing
    1.50
     knew
    1.49
    Act Density 0.260%

    No Known Activations