INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    nell
    -0.10
     necess
    -0.09
    396
    -0.09
     dopad
    -0.09
    ddd
    -0.08
     whose
    -0.08
     imposs
    -0.08
    icos
    -0.08
    stell
    -0.08
    pond
    -0.08
    POSITIVE LOGITS
     allow
    0.31
     allows
    0.30
    allow
    0.29
     allowing
    0.28
     enable
    0.28
     enables
    0.27
     Allows
    0.25
    åħģ
    0.25
    allows
    0.24
    enable
    0.23
    Act Density 0.128%

    No Known Activations