INDEX
    Explanations

    decisions being made

    instances of the word "decided."

    New Auto-Interp
    Negative Logits
    anon
    -0.71
    avery
    -0.67
    eries
    -0.66
    agging
    -0.64
    capacity
    -0.64
    ptoms
    -0.63
    quality
    -0.63
    ILA
    -0.62
    ciating
    -0.62
    Growing
    -0.61
    POSITIVE LOGITS
     unanimously
    0.84
     upon
    0.79
     differently
    0.76
     beforehand
    0.72
     against
    0.70
     to
    0.69
    ters
    0.69
     unilaterally
    0.67
     that
    0.64
     anew
    0.63
    Act Density 0.072%

    No Known Activations