INDEX
    Explanations

    expressions of uniqueness or distinctiveness

    New Auto-Interp
    Negative Logits
     censura
    -0.61
     fós
    -0.58
     scold
    -0.56
    ędzy
    -0.56
     ladite
    -0.55
     inves
    -0.55
    ConstraintMaker
    -0.53
    shi
    -0.53
     devriez
    -0.53
    InlineData
    -0.53
    POSITIVE LOGITS
     unique
    3.17
    unique
    2.96
     Unique
    2.89
    Unique
    2.84
     UNIQUE
    2.75
    UNIQUE
    2.50
     uniques
    2.41
     uniqueness
    2.38
     uniquely
    2.25
     unieke
    2.20
    Act Density 0.056%

    No Known Activations