INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    wa
    0.77
    ork
    0.71
    five
    0.69
    ni
    0.64
    of
    0.61
    last
    0.61
    for
    0.61
    land
    0.60
    life
    0.60
    nin
    0.60
    POSITIVE LOGITS
    рены
    0.57
    З
    0.55
     wealthier
    0.53
    Ϩ
    0.52
    0.52
    0.51
     väik
    0.50
    Эта
    0.50
     nonuniform
    0.50
    байд
    0.49
    Act Density 0.003%

    No Known Activations