INDEX
Explanations
references to popular culture and social media influencers
New Auto-Interp
Negative Logits
ejména
-0.19
růz
-0.18
ménÄĽ
-0.18
zdrav
-0.18
Äįlov
-0.17
úÄįin
-0.16
виÑıв
-0.15
pÃŃsem
-0.15
iyel
-0.15
okt
-0.15
POSITIVE LOGITS
zosta
0.26
ode
0.25
znal
0.24
dopad
0.22
podp
0.22
zda
0.21
uz
0.21
dos
0.21
rozp
0.20
uh
0.20
Activations Density 0.037%