INDEX
    Explanations

    phrases indicating a deliberate choice to disregard or overlook something

    instances of the word "ignore."

    New Auto-Interp
    Negative Logits
    ramer
    -0.93
    unal
    -0.92
    uliffe
    -0.86
    emetery
    -0.81
    arter
    -0.80
    alg
    -0.75
    urther
    -0.74
    gran
    -0.74
    aver
    -0.73
    raq
    -0.73
    POSITIVE LOGITS
     ignore
    0.89
     ignores
    0.82
     ignoring
    0.74
     underestimate
    0.73
     aside
    0.72
     overlook
    0.72
     ignored
    0.71
    fulness
    0.70
     neglect
    0.68
     obe
    0.65
    Act Density 0.012%

    No Known Activations