INDEX
    Explanations

    phrases related to outcomes or results

    phrases indicating potential outcomes or consequences

    New Auto-Interp
    Negative Logits
    clips
    -0.65
    ashes
    -0.62
    vati
    -0.61
    verages
    -0.57
    ahead
    -0.57
     discussed
    -0.57
    lund
    -0.57
    ida
    -0.56
    noticed
    -0.55
     recounted
    -0.55
    POSITIVE LOGITS
     be
    1.49
    Be
    0.97
    be
    0.94
     contain
    0.93
     resemble
    0.90
    asted
    0.88
     BE
    0.86
     belong
    0.86
     have
    0.86
     consist
    0.85
    Act Density 0.109%

    No Known Activations