INDEX
    Explanations

    phrases that emphasize repetition and redundancy

    New Auto-Interp
    Negative Logits
     Authors
    -0.62
    Ge
    -0.61
    Thirty
    -0.60
    saf
    -0.59
    itans
    -0.57
    vana
    -0.56
    afety
    -0.56
     Syndicate
    -0.56
    VICE
    -0.56
    ogens
    -0.56
    POSITIVE LOGITS
     again
    1.06
    etheless
    0.94
    drive
    0.94
    again
    0.81
    ride
    0.76
    clock
    0.75
     until
    0.73
    repe
    0.73
    stretched
    0.70
    haul
    0.70
    Act Density 0.009%

    No Known Activations