INDEX
    Explanations

    phrases emphasizing certainty or strong affirmation

    the word "certainly" in various contexts

    New Auto-Interp
    Negative Logits
    glers
    -0.81
    agus
    -0.76
    OSH
    -0.74
    uese
    -0.74
    lay
    -0.73
    gencies
    -0.72
    Offline
    -0.70
    idas
    -0.69
    gency
    -0.68
    ulative
    -0.66
    POSITIVE LOGITS
     deserved
    0.78
     qualifies
    0.77
     behaved
    0.74
     exagger
    0.71
     benefited
    0.70
     deline
    0.69
     ought
    0.67
     appreciated
    0.67
     torped
    0.66
     appreci
    0.65
    Act Density 0.025%

    No Known Activations