INDEX
    Explanations

    words related to significant events or actions

    words indicating significant actions or events

    New Auto-Interp
    Negative Logits
    rea
    -0.67
    conn
    -0.66
    leases
    -0.64
    rone
    -0.63
    aly
    -0.63
    zh
    -0.61
    lords
    -0.60
    rel
    -0.60
     Newman
    -0.60
    sw
    -0.59
    POSITIVE LOGITS
    ometimes
    1.05
    hift
    0.95
    paces
    0.93
    omething
    0.87
    heet
    0.86
    creen
    0.84
    ilver
    0.83
    pace
    0.80
    hirt
    0.79
    psey
    0.71
    Act Density 0.655%

    No Known Activations