INDEX
    Explanations

    positive affirmations or strong endorsements

    words related to strong positive responses or approvals

    New Auto-Interp
    Negative Logits
    nan
    -0.69
    bil
    -0.68
    perature
    -0.65
    士
    -0.65
    nesota
    -0.65
     skelet
    -0.65
     procure
    -0.63
    othy
    -0.63
    ORT
    -0.61
    pool
    -0.60
    POSITIVE LOGITS
    ounded
    1.15
    ounding
    1.10
    oslav
    0.92
    ounds
    0.91
    OUND
    0.84
    onent
    0.79
    soType
    0.79
     Sadd
    0.78
    ogle
    0.70
    SourceFile
    0.69
    Act Density 0.009%

    No Known Activations