INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Ost
    -0.08
     bang
    -0.08
     stated
    -0.08
     withstand
    -0.08
     پول
    -0.07
     pun
    -0.07
    ासाठी
    -0.07
     Wilkinson
    -0.07
    -0.07
    лем
    -0.07
    POSITIVE LOGITS
     altında
    0.08
    ward
    0.08
     unui
    0.08
    0.07
    irhi
    0.07
     XVIII
    0.07
     Cone
    0.07
    haften
    0.07
     Toll
    0.07
    cone
    0.07
    Act Density 0.011%

    No Known Activations