INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     прош
    -0.08
    -worthy
    -0.08
     William
    -0.08
     chic
    -0.07
    ized
    -0.07
     재미
    -0.07
    smanship
    -0.07
     Arthur
    -0.07
    Arthur
    -0.07
     atoi
    -0.07
    POSITIVE LOGITS
    imbus
    0.08
    (mid
    0.08
    0.08
    caps
    0.07
     pastors
    0.07
    وأ
    0.07
    Susan
    0.07
    лы
    0.07
     Assim
    0.07
    (material
    0.07
    Act Density 0.013%

    No Known Activations