INDEX
Explanations
phrases indicating user engagement with content, particularly "read more" prompts
New Auto-Interp
Negative Logits
/tos
-0.14
oro
-0.14
igid
-0.14
acea
-0.14
orns
-0.14
ches
-0.14
packing
-0.14
olumn
-0.14
web
-0.13
orn
-0.13
POSITIVE LOGITS
Gamb
0.16
fad
0.15
ormsg
0.14
ELLOW
0.14
.sheet
0.14
šli
0.14
ÏįÏĢ
0.14
Zuk
0.13
ERV
0.13
mpar
0.13
Activations Density 0.028%