INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     .
    0.69
    יים
    0.67
    ad
    0.67
    0.65
    i
    0.65
    were
    0.63
    ומי
    0.63
    of
    0.62
    W
    0.61
    e
    0.61
    POSITIVE LOGITS
     автомоби
    1.25
     автомобиля
    1.21
     автомобі
    1.19
     vehicle
    1.16
     véhicule
    1.14
     گاڑی
    1.13
     автомобиль
    1.13
     automobil
    1.11
     автомобилей
    1.11
    车辆
    1.10
    Act Density 0.299%

    No Known Activations