INDEX
    Explanations

    language related to restrictions and limitations

    New Auto-Interp
    Negative Logits
    p
    -0.70
    n
    -0.61
    -0.59
    pearance
    -0.56
    cob
    -0.56
    mtext
    -0.55
    Mein
    -0.52
    q
    -0.52
    l
    -0.52
     sp
    -0.51
    POSITIVE LOGITS
     restrictions
    1.32
     constraints
    1.29
     Constraints
    1.22
     Restrictions
    1.20
     constraint
    1.17
     restriction
    1.15
     restraints
    1.14
    constraints
    1.08
     restricting
    1.07
     bans
    1.07
    Act Density 0.339%

    No Known Activations