INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     surfaced
    -0.07
    representation
    -0.06
    нами
    -0.06
     Swipe
    -0.06
     responsive
    -0.06
    utan
    -0.06
     Oakland
    -0.06
    "Some
    -0.06
    oot
    -0.06
    duction
    -0.06
    POSITIVE LOGITS
     chair
    0.11
     Chair
    0.09
    Chair
    0.09
     wheelchair
    0.08
    IRR
    0.07
    _WHITE
    0.07
    163
    0.07
     chairs
    0.07
     goalie
    0.07
                
    0.07
    Act Density 0.005%

    No Known Activations