INDEX
    Explanations

    phrases related to recommending or encouraging specific actions or behaviors

    repeated mentions of "the same" and concepts of doing the "right thing."

    New Auto-Interp
    Negative Logits
    quished
    -0.73
    gat
    -0.72
    osponsors
    -0.70
    ildo
    -0.69
    ONSORED
    -0.68
    ospons
    -0.68
    raltar
    -0.66
    urated
    -0.66
     Leilan
    -0.65
    opened
    -0.63
    POSITIVE LOGITS
     same
    1.36
     unthinkable
    1.12
     utmost
    1.09
     slightest
    1.09
     simplest
    1.08
     latter
    1.00
     hardest
    1.00
    same
    0.98
     groundwork
    0.98
     opposite
    0.97
    Act Density 0.085%

    No Known Activations