INDEX
    Explanations

    phrases indicating a recommendation to watch something

    references to watching videos or content

    New Auto-Interp
    Negative Logits
    ctrl
    -0.98
    phi
    -0.72
    VEN
    -0.69
    ãĤ¨ãĥ«
    -0.69
    ãĥ´
    -0.68
    interstitial
    -0.68
    sembly
    -0.67
    ascal
    -0.66
    cffffcc
    -0.66
     misunderstanding
    -0.65
    POSITIVE LOGITS
    tower
    1.26
     Watching
    1.13
    dog
    1.04
    dogs
    0.98
     Watch
    0.85
    Watch
    0.84
     Dogs
    0.84
    ing
    0.82
    watch
    0.81
     WATCH
    0.80
    Act Density 0.021%

    No Known Activations