INDEX
    Explanations

    references to the concept of "zero" in various contexts

    New Auto-Interp
    Negative Logits
     the
    -0.45
     McLaughlin
    -0.45
     Marín
    -0.40
     çat
    -0.40
     The
    -0.39
    The
    -0.38
    biker
    -0.38
     הב
    -0.37
     accompanying
    -0.37
     ausführ
    -0.37
    POSITIVE LOGITS
     Zero
    1.36
    Zero
    1.36
     zero
    1.32
     ZERO
    1.31
    zero
    1.20
    ZERO
    1.15
     zéro
    1.08
    zeros
    1.01
     cero
    0.99
     zeros
    0.97
    Act Density 0.012%

    No Known Activations