INDEX
    Explanations

    references to a specific entity or term "WP" with varying activations

    references to a specific entity or group identified as "WP"

    New Auto-Interp
    Negative Logits
    thus
    -0.83
    é¾įåĸļ士
    -0.82
    ante
    -0.82
    Reviewer
    -0.79
     thous
    -0.75
    hips
    -0.74
    erald
    -0.73
    taboola
    -0.72
    angelo
    -0.72
    tes
    -0.71
    POSITIVE LOGITS
     WP
    1.29
    WP
    1.28
    olicy
    1.05
    Beg
    0.81
    FFER
    0.74
    witz
    0.73
    wordpress
    0.72
    LP
    0.71
    ITCH
    0.71
    ctive
    0.71
    Act Density 0.005%

    No Known Activations