INDEX
    Explanations

    phrases related to comparisons or evaluations, particularly emphasizing a contrast between two elements

    phrases that include the word "considering."

    New Auto-Interp
    Negative Logits
    inis
    -0.79
    ernal
    -0.78
    uala
    -0.77
    scribe
    -0.77
    jer
    -0.76
    orem
    -0.75
    arez
    -0.75
    rouse
    -0.73
    vous
    -0.73
    inals
    -0.71
    POSITIVE LOGITS
     how
    0.93
     why
    0.72
     hindsight
    0.70
     hordes
    0.70
     recent
    0.70
     everything
    0.68
     what
    0.66
     everyone
    0.66
     that
    0.65
     considering
    0.65
    Act Density 0.082%

    No Known Activations