INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Barnett
    -0.08
     breathe
    -0.07
     ballet
    -0.07
     مساحت
    -0.07
     Krank
    -0.07
     nestled
    -0.07
     lingerie
    -0.06
     resilience
    -0.06
    -0.06
    ,len
    -0.06
    POSITIVE LOGITS
     Fox
    0.11
     FOX
    0.11
    Fox
    0.09
    X
    0.09
     fox
    0.08
    NOP
    0.07
    ords
    0.07
    Fx
    0.07
    fox
    0.07
    FOX
    0.07
    Act Density 0.005%

    No Known Activations