INDEX
    Explanations

    introducing specific conditions

    New Auto-Interp
    Negative Logits
    0.78
    0.74
     Однако
    0.73
    0.70
    etc
    0.70
     ומ
    0.68
     आदि
    0.68
    انی
    0.67
    0.66
     وإ
    0.65
    POSITIVE LOGITS
     those
    1.74
     during
    1.59
     with
    1.44
     involving
    1.37
     ones
    1.37
     when
    1.35
     quelli
    1.32
     from
    1.27
     if
    1.26
     in
    1.26
    Act Density 0.521%

    No Known Activations