INDEX
    Explanations

    words related to explicit statements or instructions

    phrases that mention explicitness or clarity in statements

    New Auto-Interp
    Negative Logits
     Tycoon
    -0.87
    nesota
    -0.78
    «ĺ
    -0.77
    Squ
    -0.77
    STON
    -0.72
    rug
    -0.71
     Royale
    -0.70
    busters
    -0.70
    ADS
    -0.68
    Score
    -0.67
    POSITIVE LOGITS
     deline
    0.81
     guiActiveUn
    0.79
     explicit
    0.79
    ities
    0.78
     disclaim
    0.77
     textual
    0.77
     explicitly
    0.77
     disav
    0.75
     prohibitions
    0.75
     repud
    0.73
    Act Density 0.030%

    No Known Activations