INDEX
    Explanations

    phrases related to expressing contradiction or disagreement

    New Auto-Interp
    Negative Logits
     around
    -0.68
     pal
    -0.68
     rig
    -0.66
     shifts
    -0.65
     troop
    -0.63
     wagon
    -0.63
     specialist
    -0.60
    iday
    -0.60
     himself
    -0.59
     Cu
    -0.58
    POSITIVE LOGITS
    Not
    3.16
    not
    1.93
     Not
    1.88
    NOT
    1.79
    Never
    1.63
    Nothing
    1.57
    Probably
    1.53
    Already
    1.53
    Only
    1.50
    Except
    1.49
    Act Density 0.012%

    No Known Activations