INDEX
    Explanations

    references to potential actions or consequences

    New Auto-Interp
    Negative Logits
     Chal
    -0.68
     Xuan
    -0.66
     Writing
    -0.58
     Scand
    -0.57
    building
    -0.57
    Ready
    -0.56
     Scor
    -0.55
    Writing
    -0.55
     Vis
    -0.54
     Kag
    -0.54
    POSITIVE LOGITS
     be
    1.08
     ideally
    0.94
     doubtless
    0.92
     undoubtedly
    0.92
     imply
    0.92
     suffice
    0.91
     allow
    0.90
     likely
    0.89
     eliminate
    0.89
     surely
    0.89
    Act Density 0.196%

    No Known Activations