INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     know
    -1.48
    know
    -1.39
     knows
    -1.20
     Know
    -1.20
    Know
    -1.18
     KNOW
    -1.17
    KNOW
    -1.13
    knowing
    -1.10
     knowing
    -1.05
     connaissances
    -1.03
    POSITIVE LOGITS
     the
    1.37
     that
    1.02
     about
    0.87
     “
    0.85
     what
    0.85
     how
    0.80
     "
    0.79
     ‘
    0.77
     of
    0.75
     something
    0.72
    Act Density 0.022%

    No Known Activations