INDEX
    Explanations

    phrases introducing general statements or observations

    generalizing terms and phrases that imply common experiences or observations

    New Auto-Interp
    Negative Logits
     gent
    -0.63
    imm
    -0.62
    enting
    -0.61
     nearby
    -0.61
    avering
    -0.57
    pron
    -0.57
    driving
    -0.57
    andering
    -0.56
     chore
    -0.56
     rapp
    -0.56
    POSITIVE LOGITS
    entimes
    0.87
    chwitz
    0.86
     Helpful
    0.85
    terness
    0.83
    eus
    0.83
    resy
    0.82
     Strikes
    0.82
     Issue
    0.81
    yip
    0.77
     Called
    0.77
    Act Density 0.032%

    No Known Activations