INDEX
    Explanations

    phrases related to certainty or emphasis in statements

    New Auto-Interp
    Negative Logits
    akable
    -0.77
    igers
    -0.71
    oulder
    -0.71
    ription
    -0.70
    uit
    -0.69
    efer
    -0.69
    cart
    -0.68
    Appearances
    -0.67
    alez
    -0.67
    itionally
    -0.67
    POSITIVE LOGITS
     unaware
    0.86
     unrelated
    0.78
     forgot
    0.76
     forgetting
    0.76
     incapable
    0.75
     oblivious
    0.74
     swayed
    0.74
     influenced
    0.72
     unaffected
    0.71
     lacking
    0.68
    Act Density 0.034%

    No Known Activations