INDEX
    Explanations

    mentions of specific days of the week, particularly "Thursday."

    New Auto-Interp
    Negative Logits
     Sundays
    -0.24
     mon
    -0.23
     Mon
    -0.22
     Sat
    -0.22
    mon
    -0.21
     saturation
    -0.21
     Saturdays
    -0.21
     weekends
    -0.20
    Mon
    -0.20
     Mondays
    -0.20
    POSITIVE LOGITS
     Thursday
    0.88
    Thursday
    0.84
     Thurs
    0.68
     Thu
    0.60
    Thu
    0.53
    ursday
    0.49
     Thur
    0.38
    38
    0.38
    48
    0.35
    98
    0.33
    Act Density 0.037%

    No Known Activations