INDEX
    Explanations

    action words indicating progression or movement

    New Auto-Interp
    Negative Logits
    esser
    -0.17
    ides
    -0.17
    esti
    -0.15
    reff
    -0.15
    eneg
    -0.15
    lette
    -0.15
    ooter
    -0.14
    eturn
    -0.14
    rides
    -0.14
    iped
    -0.14
    POSITIVE LOGITS
     tém
    0.15
    olit
    0.15
    ew
    0.15
     Brotherhood
    0.15
    383
    0.14
     Dawn
    0.14
    480
    0.14
    ourage
    0.14
     Watt
    0.14
    à¥ģष
    0.14
    Act Density 0.002%

    No Known Activations