INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     ModelExpression
    -0.71
     esternos
    -0.61
    AsUp
    -0.61
    #+#
    -0.60
     sovere
    -0.60
    ReusableCell
    -0.57
     betweenstory
    -0.57
    GHIJKLM
    -0.57
    LayoutConstraint
    -0.57
     CreateTagHelper
    -0.56
    POSITIVE LOGITS
     dieux
    0.47
     gouvernements
    0.46
    pose
    0.46
    pra
    0.46
    PL
    0.46
    tagHelper
    0.45
    juta
    0.44
     médias
    0.43
    accept
    0.43
    ysław
    0.43
    Act Density 0.003%

    No Known Activations