INDEX
    Explanations

    phrases indicating uncertainty or possibility

    the word "perhaps" and its variations, indicating uncertainty or speculation

    New Auto-Interp
    Negative Logits
    chens
    -0.73
    nen
    -0.73
    emy
    -0.72
    arthed
    -0.71
    zeb
    -0.71
    ombat
    -0.71
    zen
    -0.70
    ulative
    -0.69
    jriwal
    -0.69
    elight
    -0.69
    POSITIVE LOGITS
     unsurprisingly
    0.84
    haps
    0.80
     opio
    0.77
     someday
    0.76
     sensing
    0.73
     "$:/
    0.71
     unemploy
    0.68
     involuntary
    0.67
     tempted
    0.66
     infer
    0.65
    Act Density 0.025%

    No Known Activations