INDEX
Explanations
instances of brand names and products in the context of entertainment and culture
New Auto-Interp
Negative Logits
gems
-0.18
lep
-0.16
íıŃ
-0.15
ScreenWidth
-0.14
ATRIX
-0.14
ezier
-0.14
.navigator
-0.14
utan
-0.14
abwe
-0.13
iddles
-0.13
POSITIVE LOGITS
Good
0.34
bad
0.33
Good
0.32
bad
0.31
Bad
0.30
good
0.30
-good
0.29
Bad
0.29
GOOD
0.28
_bad
0.28
Activations Density 0.075%