INDEX
    Explanations

    sentences or phrases ending with 'report that'

    repeated punctuation marks or periods

    New Auto-Interp
    Negative Logits
    ãĥ´
    -0.74
    ighter
    -0.71
     infl
    -0.70
    å¼
    -0.67
    ãĥģ
    -0.61
     overs
    -0.61
    ktop
    -0.60
    ãĥ«
    -0.58
    mith
    -0.57
    GN
    -0.57
    POSITIVE LOGITS
    shall
    0.76
    selves
    0.68
     ."
    0.66
    TAG
    0.66
    hello
    0.64
    IAN
    0.63
    safe
    0.60
     Oops
    0.60
    stocks
    0.60
    respect
    0.60
    Act Density 0.019%

    No Known Activations