INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ctrl
    -0.80
    bably
    -0.80
    phi
    -0.69
    cffffcc
    -0.67
    currency
    -0.67
     wound
    -0.66
    ylum
    -0.65
    ayers
    -0.63
    ãĥ´
    -0.62
    cffff
    -0.61
    POSITIVE LOGITS
    tower
    1.32
    dog
    1.13
    dogs
    1.04
    ing
    0.98
     clips
    0.88
     Dogs
    0.83
     videos
    0.83
     Watching
    0.82
    ers
    0.79
    points
    0.78
    Act Density 0.022%

    No Known Activations