INDEX
    Explanations

    instances where actions are being performed or suggested

    the action of taking or similar verb forms related to actions performed

    New Auto-Interp
    Negative Logits
    Cong
    -0.68
    mith
    -0.65
    idding
    -0.65
    Seg
    -0.62
    linked
    -0.60
    Develop
    -0.60
    illing
    -0.59
    eman
    -0.59
    eers
    -0.59
    gian
    -0.58
    POSITIVE LOGITS
     advantage
    1.23
     precautions
    1.09
     care
    1.07
     refuge
    1.05
     baths
    1.04
    aways
    1.03
     selfies
    1.00
     liberties
    0.98
     aback
    0.95
     shortcuts
    0.94
    Act Density 0.112%

    No Known Activations