INDEX
Explanations
phrases that indicate the viewer's engagement or interaction with content
New Auto-Interp
Negative Logits
ald
-0.17
bei
-0.15
imeo
-0.14
leys
-0.14
rgan
-0.14
ernel
-0.14
defaultManager
-0.14
etails
-0.14
prompt
-0.13
Locker
-0.13
POSITIVE LOGITS
utz
0.19
ormsg
0.18
231
0.16
ÙĨÚ¯ÛĮ
0.16
ache
0.15
aliz
0.14
ODY
0.14
etz
0.14
chop
0.14
erif
0.14
Activations Density 0.074%