INDEX
    Explanations

    phrases related to policy changes or adjustments

    New Auto-Interp
    Negative Logits
    wine
    -0.66
    ا
    -0.65
    waters
    -0.65
     stereotype
    -0.65
     Sic
    -0.60
     Heller
    -0.57
    dies
    -0.57
     Nass
    -0.56
     Sung
    -0.56
     hung
    -0.56
    POSITIVE LOGITS
     effected
    1.15
     drastic
    1.07
     gradual
    0.93
     incremental
    0.89
     occur
    0.89
     undone
    0.89
     wrought
    0.84
     undo
    0.83
     occurring
    0.81
    foreseen
    0.80
    Act Density 0.429%

    No Known Activations