INDEX
    Explanations

    words related to consistent behavior or persistence

    phrases indicating consistency or permanence

    New Auto-Interp
    Negative Logits
    SG
    -0.74
    iants
    -0.74
    LAN
    -0.73
    IDA
    -0.72
     Sunder
    -0.68
    IDs
    -0.68
     Tags
    -0.67
     OW
    -0.67
    atorium
    -0.66
    hole
    -0.66
    POSITIVE LOGITS
    entimes
    0.88
     appreciated
    0.83
    theless
    0.81
     behaved
    0.76
     conclud
    0.76
     always
    0.74
     forg
    0.73
     obey
    0.73
     sensed
    0.72
     evolving
    0.72
    Act Density 0.025%

    No Known Activations