INDEX
Explanations
references to brands or products in the context of marketing
New Auto-Interp
Negative Logits
pants
-0.16
offs
-0.16
ìħĺ
-0.15
762
-0.15
FTA
-0.14
eson
-0.14
ampa
-0.14
pal
-0.14
heim
-0.14
uala
-0.14
POSITIVE LOGITS
ippo
0.31
ipp
0.27
Fil
0.24
aments
0.23
thy
0.23
оÑģоÑĦ
0.23
ament
0.22
leted
0.21
fil
0.20
bert
0.20
Activations Density 0.009%