INDEX
    Explanations

    words related to ongoing actions or conditions that continue over time

    New Auto-Interp
    Negative Logits
    umenthal
    -0.72
     GEAR
    -0.71
    RN
    -0.67
    ĪĴ
    -0.66
    ilde
    -0.62
    ramid
    -0.62
    ritical
    -0.60
    aptic
    -0.59
    robat
    -0.58
    vette
    -0.58
    POSITIVE LOGITS
    ently
    1.57
     unchanged
    1.10
    ency
    1.09
     indefinitely
    1.01
    ively
    0.98
    entially
    0.94
    uously
    0.91
    encies
    0.90
    ences
    0.88
    ously
    0.87
    Act Density 0.016%

    No Known Activations