INDEX
Explanations
references to specific brands or products, particularly in the context of fashion and watches
New Auto-Interp
Negative Logits
elden
-0.19
Ñıж
-0.14
zdrav
-0.14
ãĥ¼ãĥĵ
-0.13
ÙĪØ²Ùĩ
-0.13
-gnu
-0.13
ragaz
-0.13
Ù쨴
-0.13
ecz
-0.13
flutter
-0.13
POSITIVE LOGITS
replica
0.37
Replica
0.34
fake
0.30
imitation
0.30
Cart
0.30
replicas
0.29
Fake
0.28
copy
0.28
ro
0.28
Rep
0.27
Activations Density 0.007%