INDEX
    Explanations

    instances of the word "aid" at varying activation levels

    references to humanitarian aid

    New Auto-Interp
    Negative Logits
     Bellev
    -0.80
     Beard
    -0.67
     Ran
    -0.66
    theless
    -0.65
     Mamm
    -0.65
    é¾
    -0.64
     aber
    -0.63
     archived
    -0.61
     unforgettable
    -0.61
     Wilde
    -0.61
    POSITIVE LOGITS
     aid
    1.37
    glers
    1.10
     Aid
    1.10
    Aid
    0.97
     aids
    0.92
    uese
    0.89
    maid
    0.83
    ãĥ¼ãĥĨ
    0.81
    aid
    0.79
    Reviewer
    0.78
    Act Density 0.020%

    No Known Activations