INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Geb
    -0.08
    -0.07
     различных
    -0.07
    Conversion
    -0.07
     madde
    -0.07
     заяв
    -0.07
    (machine
    -0.06
     chọn
    -0.06
    itic
    -0.06
     oneself
    -0.06
    POSITIVE LOGITS
    luent
    0.07
     HOME
    0.07
     acknowledges
    0.07
     Bengal
    0.06
    oliberal
    0.06
    κυ
    0.06
    bon
    0.06
     APPLE
    0.06
    italic
    0.06
     prolong
    0.06
    Act Density 0.008%

    No Known Activations