INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     alloc
    -0.41
    ruitment
    -0.41
     stø
    -0.38
    Injection
    -0.37
     widget
    -0.36
    Str
    -0.36
    NOME
    -0.36
     Dost
    -0.36
    Origin
    -0.36
    ariales
    -0.35
    POSITIVE LOGITS
     over
    0.85
    over
    0.80
     Over
    0.74
    Over
    0.71
     OVER
    0.69
     över
    0.65
     ModelExpression
    0.65
     Covering
    0.62
     върху
    0.62
    OVER
    0.62
    Act Density 0.016%

    No Known Activations