INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Adopted
    -0.42
     Infórmanos
    -0.38
     asemenea
    -0.35
     ferdig
    -0.35
     adopted
    -0.34
     fær
    -0.34
     pitä
    -0.33
    Привет
    -0.32
     assured
    -0.32
     inférieure
    -0.32
    POSITIVE LOGITS
    contentLoaded
    0.75
    MLLoader
    0.63
    fjspx
    0.54
    DeleteBehavior
    0.54
    findpost
    0.54
    المشاركات
    0.53
    pushFollow
    0.53
    bewerken
    0.49
    ConstraintMaker
    0.49
     TestBed
    0.48
    Act Density 0.053%

    No Known Activations