INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     rigor
    -0.08
     koju
    -0.07
     bas
    -0.07
     rur
    -0.07
     outweigh
    -0.07
     sull
    -0.07
    tract
    -0.07
    -0.07
    cke
    -0.07
     spiller
    -0.07
    POSITIVE LOGITS
    Pend
    0.08
    EP
    0.08
    Conv
    0.08
     ed
    0.08
    MAX
    0.07
    0.07
    Gang
    0.07
    vende
    0.07
     pend
    0.07
     hak
    0.07
    Act Density 0.013%

    No Known Activations