INDEX
    Explanations
    No Explanations Found
    New Auto-Interp
    Negative Logits
    И
    1.34
    С
    1.10
     and
    1.08
    У
    1.02
    0.98
    Ба
    0.96
    Я
    0.94
    Га
    0.93
    פ
    0.93
    Ш
    0.93
    POSITIVE LOGITS
    zelfde
    1.29
    ında
    1.22
    ěji
    1.20
    ía
    1.20
    że
    1.13
    is
    1.09
    1.09
    jenigen
    1.08
    1.07
    はもちろん
    1.06
    Act Density 3.996%

    No Known Activations