INDEX
    Explanations

    phrases indicating a level of importance or relevance towards specific topics or issues

    instances of the word "concerned"

    New Auto-Interp
    Negative Logits
    artifacts
    -0.85
     Bom
    -0.72
    obs
    -0.70
    robe
    -0.68
    fruit
    -0.68
    arb
    -0.67
    buff
    -0.67
    ingen
    -0.67
    ingers
    -0.66
    guided
    -0.65
    POSITIVE LOGITS
     proble
    0.73
     citiz
    0.72
     trolling
    0.70
    atives
    0.67
     Schr
    0.67
    reon
    0.67
     Concern
    0.67
    cerned
    0.66
    NESS
    0.66
    ately
    0.65
    Act Density 0.023%

    No Known Activations