INDEX
    Explanations

    mentions of something being helpful

    instances of the word "helpful."

    New Auto-Interp
    Negative Logits
    thur
    -0.80
    jong
    -0.71
    buck
    -0.70
    inction
    -0.70
    agate
    -0.70
     Dare
    -0.69
    BU
    -0.68
    Hop
    -0.68
    Rush
    -0.68
    metal
    -0.67
    POSITIVE LOGITS
     helpful
    0.86
     aide
    0.81
     aids
    0.80
     undermin
    0.79
     guiActiveUn
    0.78
     introdu
    0.75
    tip
    0.75
    glers
    0.74
     aid
    0.74
    ãĤĭ
    0.73
    Act Density 0.012%

    No Known Activations