INDEX
    Explanations

    phrases that reference quantities or metrics in the context of comparisons

    New Auto-Interp
    Negative Logits
    B
    -0.69
     mesmas
    -0.66
    P
    -0.63
    R
    -0.63
    M
    -0.63
    L
    -0.61
    N
    -0.61
    K
    -0.61
    D
    -0.59
    C
    -0.58
    POSITIVE LOGITS
     being
    1.28
     those
    1.18
     some
    1.04
    being
    1.00
     several
    0.93
     Being
    0.92
     BEING
    0.92
     the
    0.91
     many
    0.89
     their
    0.88
    Act Density 0.157%

    No Known Activations