INDEX
Explanations
preferences and choices in contexts of liking or favoring something
New Auto-Interp
Negative Logits
ñ
-0.15
about
-0.14
pari
-0.14
nad
-0.14
mand
-0.14
fig
-0.14
kowski
-0.14
ekyll
-0.14
ader
-0.14
á»ijt
-0.14
POSITIVE LOGITS
entially
0.29
ential
0.20
ably
0.19
à¸Ĭม
0.17
/pre
0.17
erguson
0.16
iable
0.15
Cog
0.15
-than
0.14
peare
0.14
Activations Density 0.033%