INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     щ
    -0.06
     apt
    -0.06
     майже
    -0.06
    Urban
    -0.06
    ariant
    -0.05
     xanh
    -0.05
     Appliances
    -0.05
     кус
    -0.05
     القدم
    -0.05
    니다
    -0.05
    POSITIVE LOGITS
     posts
    0.07
     أحمد
    0.07
                        
    0.07
     Kız
    0.07
     InetAddress
    0.07
    .WriteLine
    0.07
     wildfire
    0.06
     Tickets
    0.06
    0.06
     electrode
    0.06
    Act Density 0.000%

    No Known Activations