INDEX
Explanations
words related to social media and online platforms
New Auto-Interp
Negative Logits
zas
-0.17
antity
-0.15
sert
-0.15
imizde
-0.15
acente
-0.15
enerator
-0.15
stration
-0.15
eners
-0.14
ancell
-0.14
astics
-0.14
POSITIVE LOGITS
t
0.25
na
0.20
tainment
0.20
rn
0.19
al
0.19
er
0.19
ton
0.19
ei
0.18
g
0.18
anean
0.18
Activations Density 0.054%