INDEX
Explanations
references to popular culture, specifically music and entertainment
New Auto-Interp
Negative Logits
edom
-0.18
emens
-0.16
Ø´Ùħ
-0.16
erva
-0.16
seedu
-0.15
TI
-0.15
ca
-0.14
agem
-0.14
gom
-0.13
enter
-0.13
POSITIVE LOGITS
614
0.16
604
0.15
o
0.15
edly
0.14
(
0.14
scholarship
0.14
normalize
0.13
İz
0.13
xin
0.13
Took
0.13
Activations Density 0.035%