INDEX
    Explanations

    logical propositions

    New Auto-Interp
    Negative Logits
    Glow
    -0.08
    -0.07
    Bu
    -0.07
    -0.07
     нак
    -0.07
    Carthy
    -0.07
     Ritz
    -0.07
    -0.07
     sparkle
    -0.07
     dressing
    -0.07
    POSITIVE LOGITS
     Preconditions
    0.09
    /ou
    0.08
    eps
    0.08
    يد
    0.08
    /oder
    0.08
     সাং
    0.08
     Pren
    0.07
     Weak
    0.07
     Evidence
    0.07
    bole
    0.07
    Act Density 0.007%

    No Known Activations