INDEX
Explanations
requests to watch videos
New Auto-Interp
Negative Logits
ctrl
-0.80
bably
-0.80
phi
-0.69
cffffcc
-0.67
currency
-0.67
wound
-0.66
ylum
-0.65
ayers
-0.63
ãĥ´
-0.62
cffff
-0.61
POSITIVE LOGITS
tower
1.32
dog
1.13
dogs
1.04
ing
0.98
clips
0.88
Dogs
0.83
videos
0.83
Watching
0.82
ers
0.79
points
0.78
Activations Density 0.022%