INDEX
Explanations
references to online platforms and digital communication
New Auto-Interp
Negative Logits
Bieber
-0.15
aat
-0.15
promot
-0.15
rell
-0.15
ssue
-0.14
aret
-0.14
Jesse
-0.14
asio
-0.14
ceed
-0.14
iegel
-0.14
POSITIVE LOGITS
anja
0.16
kud
0.15
ENCHMARK
0.14
overn
0.14
cki
0.14
_equiv
0.14
Ferd
0.14
adel
0.13
lawy
0.13
hone
0.13
Activations Density 0.159%