INDEX
    Explanations

    self-referential words indicating action or change happening

    New Auto-Interp
    Negative Logits
    SPONSORED
    -0.76
    ITNESS
    -0.65
    elected
    -0.65
    interstitial
    -0.63
    ifted
    -0.62
    aug
    -0.60
     Married
    -0.60
    assadors
    -0.60
    ilingual
    -0.59
    onnaissance
    -0.59
    POSITIVE LOGITS
     balance
    0.85
     fortunes
    0.85
     unnecessarily
    0.84
     altogether
    0.84
     prematurely
    0.84
     momentum
    0.83
     inhib
    0.83
     entire
    0.82
     reins
    0.82
     boundaries
    0.81
    Act Density 7.666%

    No Known Activations