INDEX
Explanations
websites to visit
occurrences of calls to action for visiting websites or checking out content
New Auto-Interp
Negative Logits
ļé
-0.74
elf
-0.72
phyl
-0.69
arial
-0.69
orth
-0.67
ELF
-0.67
flatt
-0.64
raf
-0.62
bottleneck
-0.62
vanish
-0.62
POSITIVE LOGITS
Visit
0.95
Watching
0.88
Check
0.85
Plus
0.84
Trailer
0.83
Upgrade
0.82
Look
0.81
Ready
0.79
Start
0.79
Here
0.79
Activations Density 0.022%