INDEX
    Explanations

    questions starting with "Would" and presenting hypothetical scenarios

    questions posed to the reader

    New Auto-Interp
    Negative Logits
    ãĤĬ
    -0.67
    natureconservancy
    -0.61
    SPONSORED
    -0.60
    ãģĮ
    -0.59
    minus
    -0.58
    displayText
    -0.58
    ãĢĤ
    -0.58
    ãģ«
    -0.57
    hig
    -0.57
     traced
    -0.57
    POSITIVE LOGITS
     anyone
    1.04
    n
    1.03
     anybody
    1.02
     you
    0.94
     it
    0.86
     they
    0.83
     somebody
    0.82
     we
    0.81
     someone
    0.81
     YOU
    0.74
    Act Density 0.074%

    No Known Activations