INDEX
    Explanations

    committee meetings and work

    New Auto-Interp
    Negative Logits
    1.32
     it
    1.31
    ние
    1.26
     can
    1.16
    '
    1.16
     be
    1.14
     are
    1.10
    ка
    1.02
     garantia
    1.00
     we
    0.98
    POSITIVE LOGITS
    1.23
    ut
    1.16
                  
    1.14
    at
    1.14
    ت
    1.13
    The
    1.10
    ל
    1.06
    ו
    1.05
    is
    1.05
    1.05
    Act Density 0.001%

    No Known Activations