INDEX
Explanations
references to various types of dips or similar food items
New Auto-Interp
Negative Logits
uge
-0.16
tug
-0.14
ched
-0.14
ç·Ĵ
-0.14
amation
-0.14
/Runtime
-0.14
ιÏİ
-0.14
redd
-0.14
hod
-0.14
UGE
-0.14
POSITIVE LOGITS
Dip
0.24
dip
0.22
acus
0.19
ity
0.17
stick
0.17
per
0.17
hen
0.16
utar
0.16
Dipl
0.16
sticks
0.16
Activations Density 0.013%