INDEX
Explanations
phrases indicating customer service or assistance
New Auto-Interp
Negative Logits
ãĥ
-0.16
ihat
-0.16
imedia
-0.14
ujet
-0.14
orig
-0.14
nicer
-0.14
ifs
-0.14
igest
-0.14
anybody
-0.14
upt
-0.13
POSITIVE LOGITS
landed
0.25
luck
0.19
options
0.18
landing
0.17
found
0.17
exactly
0.17
clicked
0.16
stop
0.16
definitely
0.16
head
0.15
Activations Density 0.084%