INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Clar
    -0.07
    GT
    -0.07
     Clair
    -0.07
    int
    -0.07
    Jer
    -0.06
     initiator
    -0.06
     bullshit
    -0.06
    para
    -0.06
    hton
    -0.06
     Jin
    -0.06
    POSITIVE LOGITS
    USE
    0.08
    use
    0.08
    ы
    0.07
    0.07
     elastic
    0.07
    ue
    0.07
    Use
    0.07
     Vis
    0.07
    0.07
     See
    0.07
    Act Density 0.009%

    No Known Activations