INDEX
    Explanations

    phrases related to self-definition and self-description

    expressions of opinion about individuals

    New Auto-Interp
    Negative Logits
    teness
    -0.80
     Dism
    -0.65
     shock
    -0.65
     Kot
    -0.63
     Extend
    -0.63
     Indust
    -0.60
    win
    -0.60
    irm
    -0.59
     Awakens
    -0.59
     Ship
    -0.59
    POSITIVE LOGITS
     supposed
    0.86
    Interstitial
    0.82
     happening
    0.81
     gonna
    0.77
    nt
    0.76
     going
    0.75
     akin
    0.74
     destined
    0.74
    omorphic
    0.73
    alian
    0.71
    Act Density 0.203%

    No Known Activations