INDEX
    Explanations

    separate items or concepts

    New Auto-Interp
    Negative Logits
    S
    1.39
    }$
    1.25
    H
    1.24
    N
    1.23
    P
    1.22
    ;
    1.16
     is
    1.16
    C
    1.16
     =
    1.13
    B
    1.13
    POSITIVE LOGITS
    are
    1.11
    separate
    1.04
    as
    0.98
    т
    0.97
    امن
    0.91
    inl
    0.91
    да
    0.90
    etition
    0.90
    де
    0.89
    0.89
    Act Density 0.087%

    No Known Activations