INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    .delta
    -0.07
     laughed
    -0.06
     WHICH
    -0.06
    _comparison
    -0.06
    literal
    -0.06
    _answers
    -0.06
    -0.06
     Wifi
    -0.06
     twists
    -0.06
     Pearl
    -0.06
    POSITIVE LOGITS
    -topic
    0.07
     carpet
    0.07
    ixa
    0.07
     Passive
    0.06
     xe
    0.06
    .getSource
    0.06
    _Context
    0.06
    Hack
    0.06
    unan
    0.06
    	admin
    0.06
    Act Density 0.012%

    No Known Activations