INDEX
    Explanations

    statements about likely outcomes or predictions

    phrases that indicate probability or likelihood of future events

    New Auto-Interp
    Negative Logits
    inth
    -0.86
    gado
    -0.81
    aredevil
    -0.77
    zeb
    -0.77
    ithing
    -0.75
    ilts
    -0.74
    artney
    -0.74
    gencies
    -0.73
    gian
    -0.73
    ortmund
    -0.71
    POSITIVE LOGITS
     underest
    0.81
     underestimate
    0.75
     infer
    0.74
     culprit
    0.74
     doomed
    0.72
     likely
    0.70
     underestimated
    0.69
     exagger
    0.69
    NULL
    0.69
     going
    0.67
    Act Density 0.036%

    No Known Activations