INDEX
    Explanations

    phrases indicating potential actions or outcomes

    conditional statements and hypothetical situations

    New Auto-Interp
    Negative Logits
     Named
    -0.66
    noticed
    -0.63
     Cosponsors
    -0.57
     TIM
    -0.57
     Xuan
    -0.56
     Gladiator
    -0.56
     psy
    -0.55
    named
    -0.55
    eyed
    -0.55
    standing
    -0.55
    POSITIVE LOGITS
     require
    1.11
     ideally
    1.10
     be
    1.06
     entail
    1.05
     imply
    1.05
     suffice
    0.99
     allow
    0.99
     likely
    0.97
     involve
    0.97
     presumably
    0.96
    Act Density 0.154%

    No Known Activations